DOCUNBMT SKSOHE 



BD 097 702 

AUTHOR 
TITIE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 



English. 
Teachers 



Diederich, Paul B. 
Measuring Growth in 
National Council ot 
111. 
7a 

107p. 

National Council of Teachers 
Road, urbana, Illinois 61801 
nonmeober, $2.25 member) 



OS 201 625 



of English, Urbana, 



of English, 1111 Kenyon 
(Stock No. 03460, $2.50 



EDRS PRICE 
DESCRIPTORS 



MF-$0.75 HC-$5.U0 PLUS POSTAGE 
♦Composition (Literary); *Eng]ish Instruction; 
Evaluation Methods; *Grading; Higher Education; 
Language Arts; *Heasarement Techniques; Secondary 
Education; *Test Reliability; Writing skills 

ABSTRACT 

The monograph is a complete outline for a program 
designed to help English departments institute logical and fair 
procedures for grading student essays. The contents in this monograph 
include "Factors in Judgments of Writing Ability," "The Effect of 
Bias," "Measuring Improvement in Writing," ''Personal vs Staff 
Grading," "standard Scores for Test Essays," "Computing the 
Reliability of Essay Grades," "Computing the Reliability of Objective 
Tests," "Design for an Examination in English Language Arts," and 
"Imitating staff Grading of Test Essays." The appendixes, which 
comprise the second half of this monograph, include "Descriptions of 
Papers Rated High, Middle, and Low on Eight Qualities," "Topics for 
Essays," "Objective Items Based on a Central Theme," "Discrete Typ< 
of Objective Items," and "Learning to Write." (EB) 



»es 



NAUONAl iNLlU TI: Of- 



Measuring 
Growth 
in 

Englisli 



Paul B. 
Diederich 
Senior Research Associate 
Educational Testing Service 



National Council of Teachecs of English 



NC'ir: F'iDITORIAI. BOARD Charles F. Cooper, Richard C orbin. Berniee 
( ullinan, Richard IJovd-Jones. Owen Thomas. Robert F. hogan, c\ otftcio, 
Paul O'Dea, r.v i>ffhiu. 

C'OVF'.K DFSICiN Bob Bingenheimen S I AFF FDI I OR C arol Schanehe. 

I.ibrar> oi C onj»ress C atalog Card Number "74-^4480 
NC i f" Stock Number O.Wo 

Ct^pvriijht {^'^4 b> the National Coutieil of I'oachers ot Fnglish. 
All rights reser\.'»d. Printeil iti the Llnited States of America. 



National Covincil of 
Teachers of English 



1 Introduction 

2 Factors in Judgments of Writing Ability 
} 'I'her.irectot'Bias 

4 Measuring Improvement in Writing 

5 Personal vs SiatT Grading 

b Standard Scores tor Test Essays 

C\miputing the Reliability of Essay Grades 

H Computing the Reliability of Objective Tests 

4 Design for an Examination in English 
Language Arts 

10 Initiating Staff Cirading of Test Essays 
Appendices 

A Descriptitms of Papers Rated High, Middle, 
and Lou on Liight Qualities 

B I npics ft)r Test Fvssays 

( Objcclivc Ilcnis Based on a Central Theme 

I) Discrete 1 ypcs o\ Objective Items 

T Learning lo Write 

( ili>ss»irv 



BOSWELL: Sir Alexander Dick tells me that he remembers having a thou- 
sand people in a year to dine at his house: that is. reckoning each person as 
or e each time that he dined there. 

JCJHNSON: That. Sir. is about three a day. 

BOSWF.LL: How your statement lessens the idea! 

JOHNSON: That, Sir, is the good of counting. It brings everything to a cer- 
tainty, which before floated in the mind indefinitely. 

BOSWFXL: But Omne ignotum pro magnifico [Everything unknown pass- 
es for marvelousl: one is sorry to have this diminished. 

JOHNSON: Sir, ytu should not allow yourself to be delighted with error. 

BOSWELL: Three a day seem but few. 

Bmv^vWs Lift* of SamuvlJohnsofi, Apnl 18, 178J 



t 



Foreword 

Somehow the teaching of English has been wrenched out of the Age of 
Aquarius and thrust into the Age of Accountability. Many of us view edu- 
cational accountants in much the same spirit as we view the agent of the 
Internal Revenue Service coming to audit our returns. Theoretically, it is 
possible the agent will turn out to be a pleasant person, gregarious and af- 
fable, who writes poetry in his free time and who will help us by showing 
how we failed to claim all our allowable deductions, so that the result of 
the audit is the discovery of a new friend and a substantial refund. But 
somehow we doubt that possibility. 

For the specialist in measurement and testing we have our image, too. 
In his graduate work, one of the foreign languages he studied was statis- 
tics. And he passed it. The other one was that amazing and arcane lan- 
guage the testing specialists use when they talk to one another. He passed 
it. too. and is tluent in it. He doesn't think of children except as they dis- 
tribute themselves across deciles. He attempts with his chi-squares to mea- 
sure what weVe done without ever understanding what we were trying to 
do. Not so with the author of this monograph. 

Paul Diederich, an eminent specialist in testing and measurement, is as 
pleasant a surprise as the IRS agent described above. The surprise begins 
with his academic background: three degrees in Latin and Greek classics 
from Harvard and Columbia. It extends through his first teaching assign- 
ment; high «^chool Latin. It continues to this day. He still publishes articles 
on classical subjects and may be the only testmaker who reads Latin and 
Greek for pleasure at the age of 68. 

The question remains, ''But does he know anything about teaching and 
testing in English?" Fortunately, yes. Just after he began teaching Latin, 
the Great Depression set in. and soon both students and their parents be- 
came far more interested in survival than in the classical tradition. Noting 

iii 



iv 

the ilccliiu' in his classes, he projected lhat by 1440 he wouUI be down u\ 
zero students, So— 'like a rat dcscrtiiifi a sinking ship" as he expresses 
it -he swam i>ver to a Irnguayc lhat appeared to have a future, namely 
I't^kjlidt, and soon bccantc an Associate Professor and l{)anuner in Un- 
iihsh at the University ot C'hicayo. Meanwhile he hail been a nieinber of 
the I'valuaiioii St ilTot the l{l«ht-Year Study and helped I'evclop sevei^al 
tests, includinjj a measure ot inteivsts in twelve subjects* new called AIM 
tAcademic Interest Measures), the i>nly instrument inherived from that 
study that is still published by Educational Testing Service At Chicafjo^ 
the Hoard ot* lixannners was called upon to develop a larjic m'mber of tests 
ior the United Slates Armed Forces Institute, and over twi) million service- 
men received school iir eolletie credit in iMiylish through tests ieveloped by 
Diederich ami his associates. 

In 1^4^). soon after Hducational Testing Service was formed by a merger 
of three non-protit testing agencies, its president Henry Chauncoy went to 
the Middle West looking for fresh blood for his Research Division. He 
came back with Diederich and then discovered that they had been class- 
mates at Harvard. During the teacher shortage. Diederich had a hand in 
promoting the cniploynient of college-educated housewives to help high 
school I'.nglish teachers deal with their overload of student compositions. 
These were first called "lay readers'' but soon became *i'!nglish assis- 
tants** wheti it was found that they were equally effective in supervising in- 
dependent reading rmmis. The latter enabled I-nglish teachers to cut their 
large classes in half by teaching ime section Tuesday and Wednesday, the 
othor Thursday and Friday; the section that \wis not in class went to inde- 
jicndcnt reading. On Mtwiday there was a large-group presentation in the 
auditorium, and the teachers who were not involved had this day free for 
conferences with students. 

'I'he initial coneeption that led \o this monograph was not Diederich*s. 
It sianoil at VTS with a etilleague who is a friend of both Diederich and 
NC'TI". I he ihtnijiht \^as to gather together iiUo a single collection a vari- 
CIV i^f nianuscripis and })ublished articles by Diederich to make available 
to l-nglish teachers ideas and iiisights from his lifetime of experience and 
research in the leaching and nicasurement of Fnglish. 

■\s has happeticd sti many times befiM'e. Diederich gave more than was 
asked lor. Having ct>nsented \o the original plan, he worried with us about 
ihc inevitable occurrences of repetitit^n i^f ideas aniong papers on related 
topics. What ut. thought was an edittirial problem he took as a writing 
problem. His solutiiui was ttuvrite an entirely fresh manuscript. It follows. 



Lxccufivr Si cn tiuy. SCI l\ 



Introducttati 



As a test of writing ability, no test is as convincing to teachers of En- 
tjlislK to teachers in otlier departments, to prospective employers, and to 
the public as actual samples of each student's writing, especially \f the 
writitiii is dotie under test conditions in which one can be sure that each 
sample is the student's own unaided work. People who uphold the view 
that essays arc the only valid test of writing ability are fond of using the 
analogy that, whenever we want to tltid out whether young people can 
swim, we liave them jump into a pool and swim, if they can swim the 
length of the pool and back, the evidence is rndeniable that they can swim. 

But suppose we already knew that all of \hese young people could swim 
somehow or other— some well, others badly— and the test was to find out 
how well each one could swim. Then we might use five judges, each of 
whom ^vould independently write on his scorecard a number from 1 (poor) 
to 5(L\celleni) indicating his opinion of each person's swimming. Then 
suppose that over a long period of time, at every level from elementary 
schot>l through ct>llege, and in several countries, everyone who tried this 
procedure teported that about a fifth of the swimmers received every grade 
from 1 lo 5 and only a hatidful received less than three different grades 
from the five judges. Wouldn't this cast some doubt on the reliability of 
this lesi t>f swinimitig? 

This is the situation we usually face in grading essays as a test of writing 
ahiliiy. We already ktunv that practically everyone who is admitted to the 
lest uill write sf»mcthinf^. Our task is to determine how welt each one 
uriies. Then we must use judges, and their judgments are likely to scatter 
e\cn more widely than judgments of performance iti sports, since there are 
wcll-defincd standards for most sports but standards for writing arc n:i- 
ther \sell defined nor widely accepted. The principal task of this booklet 
wili be lo suggest ways of inipnwing the reliability of grades m essay*.. We 
shall find thai it is very nard to reach a desirable standard of reliaoility 

I 



\Uvo\\\i\\ essays uUmun ami %o wo shall uKd consider the inchisiiMi of a few 
sections ot objective items on related parts orprolkicncy in Hn^Hsh* Since 
ohiective items yield far hiiiher reliabilities than essiiys per unit of time, 
they \Nill usually increase the reliability of the total scine on the CNaniina* 
lion to a level that is fair to students. 

Hut why measure or jjrade at all? 1 hesitate to answer this question be- 
cause. t(^ anyiMie who buys' or bornm s a booklet with this title, the question 
is silly, the answer is obviiuis. and it is tedious to repeat the old twaddle 
about the need for accurate inUn'nuition on which to base educational de- 
cisions. and the like. But just now there is a vocal minority anuuig lingllsh 
teachers who oppose any use of grades or measures that enter the perma- 
nent reciM'ils <if students — vspecially those that indicate weaknesses* and 
they are likely to introduce a resolution at the next NCTH tncetiMg eon* 
denininii the pn^'cdures recommended in this booklet unless something is 
said in defense i)f these procedures. 
^ I'irst let me surprise these critics by sayiny ih: a 1 ayree with practically 
cverythinii they say. 'I'his is not a rhcti^rical triwK. I rcidlv mean it. l>uring 
my twenty-five years at the Mducationa! Testing .Service, one of my princi- 
pal duties has been consullinj; with secondary schools (>u problems of mea- 
surement, {grading, recurti-keeping. and leporting. I have had to visit more 
classes than I care to remen!b>M\ and my predominant impression has 
been that these classes are far* istically over-evaluated. Students are 
graded on practicalK everything they do every time they turn around. 
(Irades generate anxiety and hard feelings between students, between 
stutlenis and teachers, between students and their parents, and between 
parents and teachers. Cth'nnuni sense suggests that they ought to be re- 
duced to the smallest possible number necessaty to tind out how students 
are getMng aUuig tt>ward the four or five main objectives id* the program, 
but teachers keep piling them up like squirrels gathering nuts. They ap- 
]HMr \o have m) idea that there is any way to tuui out how much measure- 
ment of an\ objective is iMuuigh. 

()t ct^irsc there is. atid they sluu^lil have le; rned il in simic course or 
unit .»n tests and measurements. Il they have not. they will certainly know 
it l>v the time they finish reatling this bookler. the answer is fvliahUity. 
I • .-incept w ill be fully understood only after studying and trying out the 
* >c«.Hlnres 1 rcct>mnicnd. but the general idea is that fhcre are quick and 
Ciisy ua\ » tt> find the anuuitit t^f rantli>ni variation in all measurement op- 
•ralil>n^ :Mui from that amount otic can tell how much more evidence of 
I anic kind is necdetl to reach a stabic figure that will not change very 
,mii h. or in \er\ man\ cases, wo matter how nuicli nu)re evidence is 
uatlu'ird. 

Ovc* the vcars 1 have ct>mc to ai cc[M a rcliabilit> of MO in the measttre 
or sciU'N nt measures) of a!» important objective »is adequate for practical 
k'ci- in ;hc ortiinary course of schi>oK\ork. In this booklet 1 suggest an 



ERIC 



cxaniinaflDn week mi ihcotul oroach quarter or scmoHicr in which unc day 
is reserved Uw \\\\\i\\sh lanyuaHe and literauire, aiuiiher tor foroifjtn laii- 
Kua^es and literaiure. u tlUrd tor history and social science, and so on 
ipayes 41 K The essay and objeciive parts of the examination on l-nylish 
lan)»Uiiije arts are vinually ynaranteed to yield the desired reliability in jusi 
one du> ottestin>». Dipiny the following week, most students are on vaca- 
lion, but niiike up e\i minations are scheduled in the same order tor stu- 
dents who were absent or who wish to inipriwe their grade, When students 
repeat an exantinaiion tor this purpose, whichever grade is higher stands 
in the record. Tiie recDnunended scoring procedure yields convincing evi- 
denccot the average amount 4>t' improvement in writing from one grade to 
the nest within each curriculum, and it shows students how much their 
writing improves on suecessive examinations. 

At this pt)int. bef4)re we explain why it is necessary, some readers will be 
shtvketl \o learn that sucM an examination requires two essays. Those writ- 
ten in the nu>rning are graded indcpendentls by two icaehm. and those 
written in the aftermH>n are graded independently by two different teach- 
ers. Whenever the two grades differ by n ore than a certain amount, the 
paper is referred to a small eommittee of 'he most experienced teaehers, 
who substitute their own grade for whichever of the original grades was 
farther from their own. Before the examination, the teachers indicate how 
many students in eaeh of their classes they expect to niMke each grade- 
not which students, but how many. I'hese estimates arc added and c(Ut 
\eried \o pereents as guidelines to the number of papers the teachers 
should expect to find at each level of merit. Their pooled judgments need 
not hnik anything like the nt>rmal curve. If they have reason to believe that 
the group is superior, they may aim at a distribution in which no one fails, 
only 10 percent get SO percent C\. 25 percent B\. and 15 percent A's. 
Ways t)t ctmibininii the Wmr essay grades and four ohjeetive scores are sug- 
gested that will make the distributitm t^f tinal grades conform to these ex- 
pectations. 

I hese prt)cetlures. wc lu^pe. will seem more and more reasonable and 
feasible as wc proceed. Riyht mnv, hiwxcver. many readers are probabjv 
thinkin)». ''How unrealistic! We are already overworked, and it is hard 
cnouuh \o /ct our examination grades turned in on time when there is only 
lUic essay that we grade tnirselves. I wo independent ratings of two essays 
b> each student plus a review of discrepant grades are out of the ques- 

But how tmich time dws this procedure actually take? I recently intro- 
ducetl this tvpc t^f cxaminatitm in several \u\uoy high schools in which we 
sccuretl accurate rcctmis o[ the time spent in k»rading. since nmst iA' the 
papers were gradcti in Satuixi.iy workshops. We eiKHnirai>c(i1he teachers 
to work rapidlv ami lo trust iheir first impressions, since wc found that 
this increased the reliability of grading. Besides, thev . tnihl count on the 



4 Punt 11 lihihHh'h 



Jav« thai any wrious citor in jiKluiuciit s\uiiKl (mihahlv be vaMjilH Uy ilio 
sccoihl iviukv aui) ilic \v\W\s i^t iliscivpaiu grades, luu hucaiisc thv' otIuM 
waucrs aiv wiso* Inil kvausv all llnw aiv unlikely Jo err in the same 
reeMnu. I he e^sasN were shori» anU there were no eurreetiuns iU* eommeitts 
to uriie. In t;ui» ihey were torbuldeit to write anyihinii at all m the papers 
Ws\ it bias the jiulnniein ol' later rcailers* Urailcs weiv recorded on si^pa* 
rate *»ork sluvls. 

I he a\eraj»e liradin^ tinu» per essay proved \o bt» txso minutes, Two es- 
Na>s per stuilent eaeh graded luiee eame to eiiihl minutos per stiulenl. 
ih\\\ Itv';, i)t the grades were tar eiiuugh aparl to require review, and since 
ea^h rcxiew also took iwo niinuies. tiie average grading lime per studettt 
eaUK* jUM under nine minutes. We luul previiuisly made a eareful study 
o\ ihe time require^l !•) grade, correct, and contmeni on luMuework pa(H*rs. 
It averaged eigh^ ntintites per siUileni, and this result was contuwed by a 
similar siuilv under dirt'erent auspices \\\ ( alitornia* Since there are wo 
classes durin;.^ esaminatiou week, the teachers did iu>l llnd this chore un* 
dul> burdens'^nie. I he ol^jeetive exercises were soMvd by clerks and aides. 

Witai iliil Ihe teachers gel in tviurn? First ol'all. the> hail reliable scenes 
ou NM'iing ability and other language arts, and they cvutld prove it to the 
saiislaciion ol tlie HoariL their principal, ami their direciv>r vd' testing. 
I lieN also had convincing evidence ol \\v: average amount v^t'imprvuenient 
in ^^ riling per vear in eaeh eurrieuluin. and ihey couh* shou students and 
ilieir parents lum nuich improvement in writing was revealed in successive 
esaininaiiims. ihev gi>t such figures at Iheeiul ot each quarter or setncs- 
wv otien enough to keep in mueli with the progress ol each studetil. It* a 
siuileiu receiveil a hmer grade than his pride wouUI accej>t. hecvniUl take 
the tnake ui^ esamination. ami whichever grailc was higher would stand in 
I he reeiMxI. 

KenKMuber now that once unw measures reach a satisfactory level ol re* 
lialMhi\. adding more measures of iIk' same abilities will not cliange the 
l^osuioit ol inaUN siiidents. and nime \or> tar. Mi^st ot these teachers luul 
required .i paper cNcrs two weeks fn^m their studems. and it they were 
eonseieniioiis about it. it tm^k about \ov\\ hours a week lo grade, correct, 
and voimnent v^n (hem. Nms thai iIk'n IkuI a reliable measiUT of writing 
abihiN. the grades were siipernuons. aiut a e.M'ctul stiul> eoiiNinced us ih it 
the eorreetions were inore ilamaging ihan helpfid. Henee they refused ti> 
i^r.nle the hvuneum-k papers: the> cut oui tnost ol the eorreciions; ami they 
eoneenirateil o\\ briet m.nginal eMinmenis. emplKisi/ing what the studeiH 
had diUK- well. \l the end. luwNeNer. ihev might add lUie suggestion for Mie 
impn»Nemenl o| the ne\! p.iper. but ^Mrelv more than 

Here die Delense rests. In the rest of this booklet I iMidi.ie a sNstein for 
ihe eN.duatiiui ol hmi^uaue arts tha' eiiis oin nunv than ^M) percent ot ilie 
uM-.uline iluit goes i mi iI.in alter iia> in almost evers elassrnotn. I'ewiM* ami 
better measures a! longer imeiAals ol lime are enough to show students. 



tlivii iMivttis. aiiii {\wn usuhm lum Uk\v iUV doihj^s At oihot liiuon UMvh* 
01 s nhntiUI Ih? Ii'iv iM Uciuio tlivir whole uuiuls hi loachiU)! aiul sMuK^ius u\ 
Ksu lUMM. I livinl> Mk\\: ihal hkmsuivhichi shiutid Iv mluanl Mi a pn^p* 
i;rl> MihiMHhnaii \\\\v itt iuliuatiuiK hut il un\ Ihiuk Ndu van aua> v Mh 
tvsH ihiMi ihv nHiiinuiiu 1 have iwummeiiiKul in Ihin hu^klet. ihv 
eiuv i»t a hlciinu' itt tlic iWkl ^wiius hk» lhai uui aiv uhUKoIn h> suuv »d 




I' avtorji III Juci^tiUMils 
lit Wrlliiiii Abilil^ 



UMchvrs have \w\v\ iiraded a sei ot papers !hal ha\e previmisU 
hiTii tiraiUnl h\ aiuMher leaeher seldom ivali/e lun\ eoMitnonly aiu! seri- 
tMisK le.uhtM's .lis^iiiree in iheir |udi»iuenis ot wriliny ahdilv, The mosi iin« 
ptvsshe esulenee I eau otlei on this point eanie out of* a t'aeior analysis 
imlvinu'tiis ol wntinji ahihi\ that Johi Mvneh. Ss<lell ( atlion, and I pef- 
lorinedat IIS in VW* seeure4l papers written h> suidenis in their 
lirsi inoiuh M three ditfereni eolk^uv^ and hail them ail yravled In sixty 
distinumsh' J readers in sis oeenpaiional Iklds. As onr aeademie I'ldj^es 

h,id ten college l iiulish teaehi*rs. ten soeial s^'ietiee leaeliers. and ten 
natural sv i^ iuv t^\uluMs. As out tion-acadcmie indues vse had ten writers 
and rthiors. ten lau\i.Ms. and ten luisiness ;^;einiNev.. I or various riMsutis 
.sevvn »»} these siMv |iidi»cs wi iv unable to ei»mpleie tluMr assiijnnients, hnt 
all s!\ lu'lds ui-n* adeipiaU'lv repri*svnieii tn the llltv-three tiid.LO*^ ^ho 
reinauial. Iluse wrte all niiisiahdiivu people who we"e iUepls eoneerned 
ahi'iil ihv' wav st\uk'rils ^\r!to. 

In an av liial c \annn.Mio|i. w • hrin j all the inditi's lotielhei and spend a 
ilav or I wo disci tssiMii »jrailini! siaiulauls a!ul rat inn sa!?i|)|e papers until we 
ivaJ^an .K\Api ahle deejee ot e^-nsensus. IJiit intln-H simU we wanted 
h« Ihhl i»ul whai qualities m stmletn wiiiuu* mielliiiet'i. ednealed people 
Motjer .Old einphasi/e when lliev ate Itee to uraile as ihev like. Ileiiee we 
never Inonehi these si\i\ |odi!'*s toijeiher; thev graded all the papiMs at 
home. iMir .•mU dueelniis wi. re to son the pa per n into .une piles m order 
ol Lienetal merit, iisinj ilieir i»wn idea »>1 whai lonsiinhv^l uenerat meni. 
I he oiiU inks werv dtat all nine piles miM he used, wiih not K^ss than 
twelve papers m alu pile. I hen. on as main papers as possible, we asked 
ilieiii ii» write briti voinnienis on .invihmu ihev hke I iii -bsliked. 



n Paul ti. tUeUiwh 



ili tKv, the tvtiahility iif ^railing thai was shown in this study s\m\k\ tmi 
bo takon fi) rcpresviu Iho ivli;4bitity usually atlantu) in ^nulin^ essays fur 
tlic l'ollo>»v Hoavil, when wo aUi^il strict rules and onturce them by eluse 
Nupervisiiin. Hut it is probably typical of the atnount of disa^Mveemont one 
uould nuil in any tav^o group of readers without such training ami disci- 
pline thai, out the. VH) essays graded. 101 received oveiy grade frotn I to 
percent received either seven, eight, iiine ilifferonl .grades; and no 
essay received loss thitn five different grades from these fifty-three readers. 
As the first slop in lUir factor analysis, we had to anuputo the correla- 
tion—the amount of agreement- between the grades ut each leader and 
the urades of each i)thcr reader, fhe median correlation in this large (53 \ 
>M table o\ ci^rrolations Wiis ..M. 

This table was sul>jectcd to a cottiplex niaihemaiical procedure called 
• factor analysis." which has the ef'feet of picking out clusters of readers 
trom all over ihe table who agree within their cluster and disagree with 
cvcr> lather duster to a greater degree than could be i»ttnlnited to chance, 
In effect, it dtterniines how many different schoi^N of thought exist among 
the readers as to what constitutes excellence in student writing. In this 
study we toiiiul Use different schools of thought— five clusters of i*eaders 
who were evidently judging the papers on somewhat different bases, since 
Nvithiti each cluster there was a moderate ainiuint of agreement on grades 
but a substantial amount of disagreement with every other cluster, 

We have iu>i yet taught the computer hoss to tell us what those clusters 
were agreeitiy on, st) wc resorted to a classification of the comments they 
hail written on mi^st of the papers. In it trial run. when we used a random 
sample i^f readers, the first result of this elassittcation was utter chaos, for 
every cluster appeared to be eonnnentiiig on everything. The picture only 
beeante clear and convincing when we restricted the classification to the 
three readers who stood highest on each factor—that is. who came elososi 
to the central tetulencv represented by each factor — and to just those 
papers that iliesc reailers had graded either high ("'•8-^)) or lo\^ 
f-Ncn \\\\\\ this resirietiiMi. we finally tabulated 1 1.OIH comments on .VSS'^ 
pajvrs under 55 headings, ami we reduced the numbers of comments tab- 
iiLiied under each heading* to percentages of the comments written by each 
of these selected readers, so that those who wrote the most Ciunmenls 
uouUl not unduly intluence the interpretatitui. 



Int<?rpretalioti of Ihe Five Faelun^ 

I hen it became quite clear that the largest cluster (Ih readers, drawn 
from all six oeeupalioiuil fields) was most influenced by the ulcus ex- 
pressed: their richness, soundtiess. clarity, development, and relevance to 
t[»e topic and liie writer's purpose. Niniee how even this first finding bears 



MvasuriH^ (Jnmh in KngHah 7 



on a point that Is often debated by Hiu'lish teachers. Some yive little or m 
ueijiht to the ideas expressed in student papers for two reasons. First, they 
hold that ideas are the protluet of (lod-yiven intelligenee which teaching 
cannot alter; teaehinji can only help students express whatever ideas they 
may have more correctly and efteetively. Second, they believe that students 
have an inalienable rinhi ti) express any ideas or opinions they havv?. and 
any indication l\v the teacher that some aiv better than others, and hence 
deserve hi^»hcr grades, borders on censot»ship. Other teachers reply that 
one riW do something about the quality, development, and support of 
ideas in student papers by paying attention to them, raising questions 
about tliem. challenging them, and focusing attention on them in class 
discussion of selected papers. They add that students like it better when 
teachers take their ideas seriously and react to them than when they con- 
fine their attention to errors in expression. Such reactions are seldom in- 
tended or viewed as censorship. It is simply a fact that some papers are 
better thought out than others, and comments to that effect are intended 
only t.) encourage students to think caivfully about what they write. 

However that may be, it is an empirical fact that our largest cluster of 
sixteen readers from all six occupational fields had by far the highest per- 
centage of comments on the ideas expressed, and lower percentages of 
comments on the qualities emphasized by the other four clusters. Hence 
we must accept it as a fact that a high proportion of intelligent, educated 
adults do pay attention to the quality, development, support, and rele- 
vance of the ideas expressed in student compositions and weight them 
heavily in their juiigment of the general tnerit these papers. This is 
certainly one basis on which the writing of our students will be judged, and 
I'liiulish teachers will be vvell advised to give it considerable attention in 
their instruction ami in their comments and conferences on papers. 

The next largest cluster I readers) had by far the highest percentage of 
coninietus m\ errors in u.su^v. svntvm-v stmctNrv, punciuatum, and spolh 
///,e It was no surprise that seven of the ten college Hnglish teachers stood 
hiuh on this factor. This may be a good time to explain whv I can cite a 
number like seven when we tabulated the comments of only the three read- 
ers who stood highest on each factor. That tabulation showed us what the 
factor ^;/r(/^//- -that is. the distinetive emphasis expressed in the comments 
of the three readers whn best represented that factor. 'I heti we I'ould look 
at the occupational fields of the thirteen readers who belonged t,i this clus- 
ter- whose grades came cU^er to those given by the three higliest readers 
than to those given b> members of any other cluster— and seven proved to 
be college l-nglish teachers. Of the three who stood highcM on this factor, 
however, just one was a ctillege I'ltiglish teacher, antither was a science 
teacher, and the third was a busitiess executive. 

The third cluster (^h'caders) showed the highest interest td any group in 



erJc 



8 Paul 11 DUilvrich 



nri^uHizaiitni ami ufuilysis, which appear lo bo closely related. Four ot* the 
sc\cn business executives who Ciunplcicd their assignments stood high on 
this factor, riiey were ''organi/ation men'' in more senses than one. 

I'he fourth duster (also of 9 readers, but with no occupational bias) 
stood highest in comments on wording and phrusinf^ — the choice ynd ar- 
rangement of v\i>rds, including the deletion of unnecessary words. I sus- 
pect that this was at least in part a y<n'ulnilaty factor — that these readers 
were more impressed tlian other groups by a large* mature vocabulary* but 
there was no way to prove it from their comments. 

Finally, the fifth and smallest cluster ("^ readers, four of whom were 
either writers or editors) emphasi/ed style, individuality* originality* inter- 
esi. and sincerity — [\w furstfHal (iiia/uirs revealed by the writing* which we 
decided to call "tlav. r.* although they themselves called it "stylo." We 
avoided the latter as a label for this factor, since the people who empfia- 
si/ed wt)rdingand phrasing were also interested in ''style*" but in such a 
differc!!! sense that they came out on a different factor. They were inter- 
esuul in style in the use of language, Init the fifth cluster was interested in 
si\le as the revelation of a t^crsonality in writing* as shown by such com- 
ments as 'Mbrceful." "vigorous.'' "outspoken,'' "sincere.'' or "inflated,'* 
"pretentious.*' "dogmatic/* or "seiuimenial.** In any large group of read- 
ers, these seven would probably be reeogni/ed as the devotees of creative 
writing, .md the fact that four of the seven were professional writers or edi- 
tors connrnied this impressio?!. You know that the writing of Mark Twain 
iind F'dgar Allan l*oe is so differem in its general character tliat yon eindd 
hartilv mistake one for the other. It is this sort of difference in the person- 
alitv revealed bv writing that vve decided to call "flavor.*' 

If you are interested in ?uimbers. you may have noticed that these five 
clusters nf readers 1 1.S ^ 1."^ ^ ^ - ^) ^ "*) add U|Mo tlfty-four readers* but 
we had otily tlfiy-three who completed their assignments. This is not a mis- 
t*ikc. Although this procedure miniini/es overlap am<Mig the readers, it 
was ineviiable that some stood almost equally high on two dilTerent fae- 
lors. while a few did ?u>t belong to any cluster— they disagreed with every- 
btnlv. Alilunigh it is conceivable that the latter were better judges than 
anvone else, the prolvibility is higher that there was too rnucli random 
variation i?i their grades to asst>eiate il i*m with any distincl school of 
ihoughi. 

Wm lUiiv wonder why we did luu classitv the comments \o begin with 
,inii call the largest g?-oupt»t com?ncms Factor 1. the next largest Factor 2, 
aiul so OIK 1 he answer is clCiU* and compelling, if you kmuv only the per- 
ecntaue of comments that can be classified under a given heading, there is 
no wav to icU how much influence this liCiUling had on the wav these read- 
ers grailetl tlie papers. ca?inot sim|)lv ask thetn invause few if any 
reader's are conseii^us of what they are actuallv responding to in student 



Mvasurinfi Growth in L'nf>lish 



9 



writiti.u that tnakos ilicm grade otic paper higher than anotlier. Sonic of 
ilic most comnuMi typos ot'ct)ninKMits did not come out on any factor since 
they weiv nuide by every type oi reader. 

Hence yoii have tt> find clusters of readers who are judyiny tlie papers on 
somewhat different bases, since there are siynitlcant differences between 
ihc.ijradcs assijjncd by cadi duster, yet a fairly high amount of agreement 
within each cluster. Then you know that whatever these clusters of readers 
are lookin.u at has a dcnionstrable effect on their grades, since their grades 
do in fact liiffer. ^ ou find out what they arc looking at by classifying the 
comments of the readers who best rcpreseni each duster, and you find 
that one duster has the highest percentage of comments on the ideas c.k- 
pressed, atmther the highest percentage of coniments on mechanical er- 
rors, and so on. I lien you know that these distinctive emphases actually 
influenced their judgnicnts. 

It was interesting and illuminating that we found five and only five dis- 
tinct schools of thought among these fifty-three distinguished readers, cm- 
phasi/ing ideas, fiicdianics. organization, wording, and flavor respective- 
ly. I here is some room for argument as to the exact interpretation of these 
five factors, but there is no reasojiable doubt that our study revealed just 
five different bases for the judgment of our sample of M)i) papers, or that 
the distiiiciivc emphases of these five ways of looking at student writing 
could be described fairlv accurately by the labds we chose. Antither study 
using a differem writing task, different siudcnts. and possibly a diffcrem 
agclcNcI might yield somewhat different eondusitins. but the five facnirs 
«e found in this particular study are a matter of knowledge, not opinion. 
We A//nit !hai these five qualities in student writing influenced the judg- 
ments oftliis particular set of readers, and 1 use the word know ddiberate- 
l\. Ihesc results arc far more convincing than any theoretical, armchair 
analysis of liow students ought to write. We hope, however, that something 
like this siuiiv «ill be tvplicated by several different investigators as time 
gi'csoii. since truth finally emerges only after several independent invcsti- 
g<ilions ivacli esseii iaily tlie same conclusions. 

i here was one other study of this son. almost ctmairrcnt with our own. 
but we hvaui about it oidy after our study was ctmipleicd. It was dtine by 
ihc lutlian psvchologisi Remondiiui, using papers .vritieii in Italian by 
eK-\en-\oar-olds. Although hi . method differed slighiiv from ours, the fac- 
tors he lound eould reatiily be translated inm the labels we chose except 
ih.u he found an additional factor that he called "graphics" and we called 
"handwriiine. neatness." This addition was explained by the fact that he 
used ihc original handwriiien papers, while we had to use typed copies. 
Later, when we were having teachers rale handwritten papers, we added 
Kemondino's taeior in our list. 

\ou miw iliirik. " ihe reason for the unrdiabiliiv ol essa\ grades is now 



10 Paul B, Diedem h 



dciu*: siMUc readers arc influenced mainly b> the ideas expressed, others 
by the number of errors they notice. otheJ^ by organization and analysis, 
and so on, Tb/y are looking at different things in the papers, or they are 
weighting thmi differently/' 

*I*hat is true, but it is not the whole story. The extent to which our fifty- 
three readers uere influenced by these five factors is indicated by the sum 
of their 'loadings" on these factors. On the average* the sum of these 
••loadings" explained 43 percent of the variance in grades: the remaining 
S'' percent was unexplained. Some of the remainder may ultimately be ex- 
plained by factors which have not yet come to light or by more reliable 
niciisures of the factors wc discovered. But most of it is probably due to 
two causes that are not amenable to factor analysis: unique ideas about 
grading that are not shared by any other reader* and random variations in 
judgment* which may be regarded as errors in judgment. The extent of the 
latter miglii be revealed by having the same judges grade the same papers 
six months later, after they had forgotten the grades they originally as- 
signed. The correlation between their earlier and later grades might aver- 
age no higher than .50. which would indicate a large amou!it of chance 
variation in grading. But this would be so expensive* and the readers 
would be so reluctant to tackle the same papers again that we did not dare 
to suggest it. 

A nu)rc detailed explanation of the meaning of our five factors is given 
in Appendix A. A few research-minded readers of this report may want *o 
examine the full* original report of this factor analysis. It was published 
(nuiltilithed)by Educational Testing Ser\'ice in August 1%1 as Research 
Bulletin hI-15. but it has long been out of print. The only way to get a copy 
now is to ask F. TS to make a Xerox cop> of its file copy, but that would be 
very ct^stly* and we advise against it. The full report is ninety-two pages 
long, extremely technical* and crammed with figures that are no longer 
relevant. The only use a researcher could make of it would bf? to study the 
niaihematieal procedures used in the factor analysis, but advances in 
eonipiiter teehnoU^gy since that time have made these procedures obsolete: 
there arc ntw\ simpler, quicker* and less expensive procedures. One may 
lake it on faith that the procedures we used were sound* and their results 
valid, because they were designed and super\ised by Ledyard Tucker* 
whose authority in the field of factor analysis is unquestioned. All of the 
finilings relevant \o the grading of essays have been reported and 
explained in this summary. 



The Effect of Bias 

Another danger in grading essays that we must try to avoid is bias on the 
part of the readers — either for or against particular students, the views ex- 
pressed (sueh as liberal or conservative), the way of writing (ornamented or 
plain, lengthy or suceinet, etc.). There are even particular types of errors to 
which some teachers react so strongly that they are likely to fail any paper 
in which they appear, no matter how good it is in other respects. Bias ap- 
pears most obviously when a teacher is grading the papers of his own stu- 
detits, knowing who wrote them. If a teacher reads the paper of a boy 
known to be dull. lazy, careless, and impertinent, it would take a remark- 
able paper to overcome the prejudice that the teacher has formed against 
him. On the other hand, if the paper was written by a model student, or by 
one with whom the teacher sympathizes because he has recently had seri- 
ous trouble at home* the grade is likely to be higher than a dispassionate 
analysis of the writing would warrant. 

Hven when the paper of a given student surprises or disappoints us, we 
are likely to change too little. When 1 get a poor paper from a good student 
who generally writes well, I tend to think, *Too bad; he had an off day. 
Km afraid that IMl have to lower his grade to a B." But if that same paper 
had been written by a poor student* it could easily get a D or an E. 

The effect of this sort of bias was prettily illustrated by an experiment 
conducted in twelve school districts in the state of New York by another 
man at I-'f S* Dr. Benjamin Rosner. Since sve were comparing four meth- 
ods t)f improving writing* we wanted the grades on writing to be highly re- 
liable so that we could detect significant differences* even if they were 
small. Hence wo asked for one test paper per month on a topic selected by 
the central staff, written on the kind of paper that yields three sfiarp. clean 
copies. We kept t^ne of these for our tiles* rcnuwcd all identification except 
a ciule number from the other two, and sent them back to two different 
scluH)ls for grading. 

11 




ERLC 



12 Puul li. Diidmch 



I he teacliors uho i»ra(icd these papers knew nothing whatever about the 
writers- niU even whieh sehool they attended. Soon they eiMiiplained that 
they ought to have at least a Httle intbrniation. sueh as whether the paper 
came from grade ^ or 10 (the only two grades in our study)* or from a 
regular or "honors" elass. beeause the latter should be judged by higher 
stanilards. 

Dr. Rosner said that this was a reasonable request, atid it afforded an 
opportunity for a sub-experiment on the kinds and amounts of intorma- 
lioti about students that led to the most reliable grading. He promised that 
all papers would henceforth be stamped with one bit of information each 
nionilu sueh as w hether it came troni a boy or a girl, grade 9 or grade 10, a 
regular or "lumors" class, and so oti. 

What the teachers did not reali/.e amtil i)r. Ri^sner told them at the end 
t>f the year was that half t)f this itiformatiim was true and the other half 
was faKe. Remember now that two copies of each paper were sent to dif- 
ferent schools for grading. One month Ur, Rosner would stamp one copy 
of each paper "boy'' and the other copy "girl" The next month he would 
stamp one copy ''graile and the lUhcr copy "grade 10/' The next time 
he wmild stamp one copy "regular'' and the other copy "honors," and so 
on with iliffercnt bits of information each month, 

I he only bit of information that made any difference at all in average 
grades was whether the papers were stamped "regular" or "honors/' and 
that difference was exactly opposite from what the teachers expected, 
Tlie> had ariiucd that honors classes should be judged by higher stan- 
dards, but the papers that were stamped "honors" averaged almoM one 
iiraiie poitit higher than the t>ther copies of the very same papers that were 
stanipeii "regular." 

The explanation seems \o be that grading is such a suggestible process 
that \sc find what ue expect to fmd. If we think a paper came from an 
honors class, we expect it [o be pretty good, and that is what we tmd. If we 
think it came from a regular class, we expect it \o beimly si^-so. and that is 
what we find. 

It a siniile word stamped on a paper can have this much effect on 
urades. think how much effect the full personality of the student must 
have when wc liratlc papers kiuuving who wrote them, with all their past 
hchaviiu* and circumstances in mind. Simic teachers ari»ue that our knowl- 
cduc of each student oiight to have tliis effect— that a poor writer who has 
di»nc his best oiiiiht to receive a higher grade, while a brilliant writer aIui 
has not ciMUc up to his usual standard ought to receive a Unver grade tlian 
the actual merits of the })apcr woidd justify, I can see some justification \'ov 
this trcatmcn: of the twehe to twcnt> papers per year that are written tor 
practice, but tu^t in the two to four test papers per year that are graded 
In determine how well each student actually writes. Theti we are judjang 



Mi'usurmg Growth in English 13 



\\riiini». no\ siudcnis. Piai>o or blame cniers at a later point. The poor 
writer wlio finally eai*ns a passing g!\idc of D may be eongratulated; the 
brilliant writer who disgraees hiniselt* by getting a B (when he should have 
nunle an A) may be taken sternl> ;o task, or eomforted, or urged to repeat 
the examinatii>n. 




Measuring Improvement in Writing 



Bias in grading test papers is easily avoided by a procedure tor measur- 
ing the amount ot'improvement that eomes about in each year of a writing 
program. 1 have recommended it in articles in English Jfmrmil (Diederieh* 
Paul B. ''Hin\ to Measure Growth in Writing Ability." 55 (April 
435-4^)K and it has been adopted by many junior and senior high schools. 
For this purpt)sc we ask all students in a span of three or four grades (such 
as grades "-H-*). or grades 10- 11-12. tir even grades 9-l()-l 1-12) to write a 
paper on the ame tt^pic on the same day, but not necessarily in the same 
hour. l*ach student numbers his paper witi* any number of six digits (like 
W2S.4()1 or (HU.25h) that pops intti his head and writes no other identifica- 
tion on his paper. He copies this number on a separate slip and adds his 
name, grade, class, teacher, anti any other information that ma> be 
required. I hese name-slips are arranged in the numerical order of these 
self-chosen numbers arui are locked up until the grading is finished. 

Having thestudeius clumse their own numbers not only saves the trou- 
ble atul expense of stampiiii^ code numbers on the papers and keeping a 
rect>rti of which student received each number, it also gives students great- 
er ci>!)ridence that their papers will be judged without knowledge of tlie 
idcniitv ot the writers. Duplicate numbers arc no problem. When the 
name-slips arc arranged in numerical order, the duplicate numbers come 
together. Then we match the handwriting on the name-slips with the 
lunulv\ritinmMi the papers bearing these numbers antl change the number 
o! the siuvicni who comes first in alphabetical ortier—usually by adding I 
to ilie last digit. It* that results in another duplicatioti. we achi 2 or an\ 
v)lher mnnher that will ilisiinguish papers bearing the same tunnber, 

\ he papers are alsn arranged iti the order i>f these self-chosen mimbers. 



ERIC 



14 Paulli. Diedi rich 



which puis them into an obviously random order — with all three or four 
grades scrambled together— and are divided into as many piles as there 
are teachers to grade them. Haeh teaeiier records his grades and com- 
menis on a separate work sheet and is forbidden to write anything at all on 
ihepapers* lest it bias the judgment of the second readen He turns in both 
his work sheets and the papers he has graded to the person in charge of the 
examination, who locks up the work sheets* 'I*hen the papers are turned 
t)ver to another teacher for a second* independent rating — with no knowl- 
edge of the grades given by the first reader* Again, the second reader re- 
uM'ds his grades and comments on a separate work sheet and writes noth- 
ing on the papers themselves. Both readers should rearrange the papers in 
their original numerical order before turning them in, 

After all readers have turned in their second batch of papers and work 
sheets, the person in charge compares the two grades and pulls out all- 
papers ow which they ditTcr by more than one full grade-point* That is. if 
one grade is B and the other C* they will simply be combined to get the 
final grade: but if one grade is B and the other C-* that is just over the one 
grade-point limit* and these papers should be reviewed by procedures that 
will be discussed later. If the "standard scores" t\>r tesi essays that will be 
explained later are used* the two scores must be more than ten points 
apart to quality for a review. In our experience* after a high school staff 
has had some practice in grading essays in this manner* only one paper in 
ten or twelve needs to be reviewed >n order to iron out serious discrepan- 
cies in grades. 

I he main ptMut I w ant to make now is that staff grading of papers writ- 
ten by all students in a given schtiol — on the same toj.ic and the same day 
— and identified im\\ by numbers chosen at random by cach student will 
coniplclely eliminate bias cither tor or against particular suidents. The 
readers have no idea who wrote any paper — not even the grade in which it 
was written, nor whether it came from academic or vocational* regular or 
honors classes— since the papers are all mixed ti^gether in a random order, 
Incideniallv* this nuxing makes the task of grading the test papers easier* 
since the stack of papers given tt^each reader w ill probably include papers 
all the wav trom the U^p class in the highest grade to the bottom class in 
the lowest uradc t>f his schtuil. Hence dift'erciiecs in the quality of writing 
a»c far nu)re gross and obvious tluhi in ihc papers one gets from any one 
cl.iss. 

Moreover, since cach student\ writing will be judged by at least four 
(iittorciit readers in the course t^f a year, any bias tinvard liberal or conser- 
\aii\c \icws. pLiin or fancy writing, and the like will almost certainly be 
caiKvllctl oiu. f our readers aa* nt)i necessarily wiser than one* but it is un- 
likcK th*il till tour will err in the same tlirectitm. 



Measuring Growth in English IS 



Results in One Senior High Scliool 

The jjradiny ot cvi n inio test essay in this fashion can provide powerful 
amnuinition a.uainst our eritics. who often eharge that students learn noth- 
ini» about writinji in high sehool. Here are the results of rating 1.065 
papers written on the same day in grades 10. IK and 12 of a senior high 
sehool that stood almost exaetly at the national average in general verbal 
abilitv. 



NONACADEMICS ACADEMICS 





Grade 10 


Grade 11 


Grade 12 


Grade 10 


Grade 11 


Grade 12 


HIGH 


5% 


8% 


9% 


22% 


41% 


53% 


MIDDLE 


34% 


53% 


63% 


65% 


52% 


42% 


LOW 


61% 


39% 


28% 


13% 


7% 


5% 


AVKI{A(iK 


326 


397 


455 


475 


606 


650 



*I*hese papers were rated by eight Fnglish teaehers on a "stanine" scale 
of W |>oints. which we shall not explain beeause an easier scale will be ex- 
plained later. Here it is suftleient to understand that, for elarity in presen- 
tation, ue called the three top stanines (24''n) a high rating, the middle 
three i52"i') a miJdIr rating, and the bottom three {24"o) a Am- rating. The 
percenis show the percentage t^f students in each grade of the nonacadeni- 
ie and academic curricula who received high, middle, and low ratings. 

Since the papers written bv ntmacademies were mixed with those writ- 
ten by academics, the former emild get very few high ratings in any grade: 
the competititMi was \oo formidable for nonverbal students. Their im- 
pnnenient is revealed nnw clearly by the percentage who reeeived middle 
ratings: from .^4 pereent in grade 10 t(ih.^ percent in grade 12. Best of all is 
what happcnctl to the percetitage who received low ratings, which declined 
from ol pereent in grade 10 {o 2S percent in grade 12. Henee. although 
these nomerbal siutlcnis could mn htipe to become really good writers, 
fewer than half as many in grade 12 wrtite a paper that would really dis- 
grace the scluKil as in gratle 10. 

I he inipn»\enienl tjf the acatleniies is best shown by the percentage who 
rocei\ctl high ratings: from 22 percent in grade 10 to 53 percent in grade 
12. Since so nian> moved into higher braekets. their percentage of middle 
grades hati to decline: from percent in grade 10 to 42 percent in grade 
12. 'I his d(»es !iot mean that the middle grouj) of academics declined in 
uriiiiii^ abiliiv. The three percents in each ctilumn have to add up to I(K) 
percent, so il more than half finally achieve high ratings, less than half can 
remain in the middle group. Their percentage oWow ratings deelined fmm 
1.^ |)erecni to 5 pereent for the same reason. 



Ih Paul li, Diederich 



How ahnu drojHHU ot'lhc loss able wrii ;rs as an explanation of the im- 
pnnenient shown in these pereentayes? 'l*he dropout rate in this sehool 
was ne.uHjiible. There were inily 5 pereent fewer students in grade 12 than 
in i»rade 10— far too small a difference to account for the massive shifts in 
pcrcentaj^es across this table. 

C ould j»rade 12 have simply been brighter than grade 10? This is a ques- 
tion tliat the routine collection of standardized test scores year after year is 
well equipped to answer. The answer was a decisive ''No!" There had been 
no sijinitlcant difference in verbal ability in these two grade levels when 
they entered this schooK There was. of course, a substantial difference in 
verbal ability between academics and nonacademics but not between one 
uratle atui the next within each curricuUini. 

The bottom line of the table, labeled "Average." refers to the average 
stanine sctM^es of all students in each grade of the academic and 
mmaeademic curricula— with decimal points omitted. This omission is a 
bit of strategy that at first seems dishonest but actually gives school board 
members and the public a truer picture of the amount of improvement 
from one grade to tlie next. Since stanine scores run only from I to 9» the 
'Victual" averages in this bottom line would run from 3»26 to 6.50 — not 
from .'<2h to h50 as we have written them. We first reported the "actual" 
a\erages. and the reaction of school board members and even teachers — 
who ought \o know better— was shock and dismay. A typical comment 
was. "l.tnik at the difference between the averages of 1 1th and 12th grade 
academics: h.Oh to h.5(). a difference of only .44 of a point, which is less 
than the difference between B- and B. Is it worth all the time and effort we 
put \mo teaching composition in grade 12 to produce an average differ- 
ence of less than half a grade-point?'' 

What such critics do not realize is how sluggish the averages of large 
groups of students must necessarily be on a scale that has only ^ points, 
(iivcn the wide range in ability within each grade, the uncertainty of the 
uratling. and the tendency of students to write some papers better than 
others, would \in\ expect the average of any of these six large groups to be 
less than .V/ or more than If \m. the maxinuim attainable difference in 
such averages {ov large gnuips \\in\\d run from ^ to and this school came 
prcttv closi* xo it: from .V2h xo U.H). The smallest difference (.44) between 
1 1th and 12th grade academics is natural and inevitable. As one approach- 
es the top of anv scale. diPerences are bound to get smaller. Already in 
grade 1 1 the academics had received alnuist as many H\ and 4\ as Knglish 
teaclicrs are willing to award. Hence grade 12 could not shtnv much im- 
provement because there was Xoo little rtnmi left Xo detect further gnwvth. 

li then occurred to us that we need not call the hnvest stanine 1 and the 
hiuhcst stanine I hesc are not entities like inches or ptuuuis: they are di- 
vidine lines in distributiiMis of scores: and we may call these dividing lines 
anvthinu we like, provided thes are successive numbers with equal inter- 



Measuring Growth in English 17 



vals betv ccn them. Many tost publishers eall their dividing lines 30, 40, 50, 
00. and "'O: the C\)lle«e Board ealls them ,^tX). 400, 500, bO(\ and 7(K). In 
this ease, we decided to eall the lowest stanine 100 and the highest 900. 
Then we could omit the decimal points with a clear conscience and save 
nuieh fruitless, unintornied argument. 

These corrected averages reveal two points of interest. First, note that 
the nonacadeniics fuially reach an average of 455 in grade 12 while the 
academics start with an average of 475 in grade 10. In spite of this large 
diffetvnec in writing ability, note the relative amount of growth in these 
two groups: 129 points for the nonacadeniics, 175 for the academics. Be- 
fore this little study. I asked the F-nglish teachers to guess how the im- 
provement of the nonacadeniics would compare with that of the academ- 
ies. Most of them guessed that the nonacadeniics would show no improve- 
ment at all. and the most optimistic estimate was that they might possibly 
show half as much improvement. That was far off the mark: they gained 
5 " as much as the academics. No one thereafter regarded the teaching of 
writing ti) these groups as a hopeless task. 

The effect of even this first attempt at staff grading of unidentified pa- 
pers on the morale of these Fnglish teachers was remarkable. They had 
been so beaten down by the complaints of colleagues and parents that they 
v\ere alnu)st ready to believe that no one learned anything about writing in 
high sduHil. But after these figures were published on the education page 
of the hval newspaper (surely an unusual outcome of any examination!) 
they went about with their chests out and chins up. saying, ''How long will 
it be before the science or sochi\ studies teachers can show evidence of such 
substantial gnnvth ttnvard any (objective of comparable importance? We 
didn't know whose paper we were grading, and there was no way to fake 
the percentages. So if anyone still thinks that students learn nothing about 
writing in high schtn^K will he kindly explain how these shifts in percent- 
ages ctuild tK'cur?*' 

I shtnild add just this caution in regard to such '.ables of percentages. I 
once conducted such studies on the same day in several junior high schooU 
of one belterMhan-avLragescluuil ilistrict. and one school showed far high- 
er gains from grade \o grmie than any other. Since I had visited classes in 
these scht)ols repeatet!l\ and could i\o\ recall anv difference in teaching 
nielhiuis or skill that cmild ucetunit for this finding, I had to look into their 
niethotis i>t rating the papers. 'I he sehmil with the liighest gains had en- 
trusted the task of rating all the papers to hs {wo ohWsi teachers who had 
serxeti for many years as College tioimi readers. *['he tither sclumls had in- 
volved all their f-nglish tca:-*HTs. even thtnigh they had done tmthing to es- 
tal>lisli standards, and so their ratings were much less reliable. One can see 
whv this would cut down the apparent gains fmni grade \o grade if one 
imagines the «.-\irenie ease in which all ratings were assigned by throwing 
iiiec. 1 hen there would be no ditferet;ee at all between the averages of 



ERIC 



18 



Paul It, DieiUriih 



grades ''. 8, and 0, 'I'hus any clcniont of chance that enters intt) the ratlnys 
uill reduce the apparent yain \wm\ one year to the next. This is another 
reason tor iryiiiii to increase the reliability 4>f' essay grades and for learning 
how to C4nnpute their reliability before eoinpariny yains per year in differ- 
ent schools. 



Reporting Results to Students 

Althi>uyh you may ayrce that the procetlure just outlined is a feasible. 
conviuL'inii way of n:easurinj» average improvement in writing fnnn grade 
ii^ grade, you may wonder how it can give a true picture of the status and 
pri>gress t>t' individual students, once it becomes a standard examination 
pn)cedure. It seems unfair to younger, vocational, and remv'dial students, 
since ihv mixing of papers together without ideniitieation throws ihein 
into competition w ith all other students in the same span of grades. So it 
iloes. and for this reason we report two and occasionally three scores after 
thiN sort of examination. 

rirst. wc report a standard score (of a sort ti> be explained later) that 
shov.s eaeh student where he stands as a writer in the total population of 
the school. This is a very important figure because it is the one that moves. 
Since there is a great deal of natural and induced growth in writing ability 
at this stage ot development, an average writer should expect to stand in 
ihe lowest third of his sehool during his tlrst year, in the middle third dur- 
ing his seeomK and in the highest third during his third, In the traditional 
grading system, he wiuild get a C in all three grades, and no one on earth 
eouhi tell him how mueh. if any» improvement that represented. Instead, 
we use rather large numbers [o show him where he stands in the total 
school population on each examination, and how he works his way up 
ihroiigh this pt^pulation as he advances from grade to grade. 

Secimd. we rept)rt another score shtuviiig each student where he stands 
in the group with wliieh he may most reasonably be compared, such as 
lenih grade remetlial vt>catitmal students or twelfth grade academic 
honors students. Hence, even if a student stands low in the total popula- 
tion of the scluu)l. his standing within his own group may be quite respect- 
able. This is the score that corresponds most closely to the kinds of grades 
that arc usually given. 

I hirii. wc mav ov ma; not report a growth score showing where a stu- 
ilent stamls \v eom|)arisoii witli other studeiUs who started at or near the 
same p»>im. *1 his is no\ (hme routmely. lun\ever. because gnnvth scores, 
while highlx rcgardetl. are thv least reliable of all educational measures, 
aitd there arc widi differences i)t i>pinion among te^tmakers alnnu how to 
coittpuic them tor indivitluals. On the wluile. I prefer to forget abmit ctmi- 
piOMiiNe growth scores and content myselt with showing studems how tar 



!ho\ Kwc adwuuvil in ilic sduu^l }n)pulaUitn uh each suwcsslvc cxaiitiiia- 
lion. Ihm tocU) this will hi» o\plaiucil tit a hiUi* sovlion. I should mention 
that il c\cr Itiiiiiv out how ti> 4ivt ''crilciion tvlcivtKvil" scoivs on writ- 
niu ahihts. the omipaiison \»1 otto simlcn'S utowth with that of another 
should present ilittieiihies. At ptvseut I see no way to do this. Inn si> 
nianv hriyht people are working un the problem that there may be a 
breakthtHHiiih ai an> tttoment. 




Pvrsoiiul \H Stuff GrudiitK 



1 ha\e always iau.i>ht writini* uinion}» other things) and have always be- 
lii'Netl thai improvement in writini* takes a jiivat deal i»t' practice and yuid- 
aniv. Menec I have nearix always required a paper a week t'roni my stu- 
tienis. and in hiiih scIuujI I always jtiraded tliese papers myselt'. The jjrad- 
ini> was the most dinieiitt. linie-eonsuniinm, and aj^oni/iny part of the 
whole leaehmu pnwss. I ilid not mind writinjL* l>riet' eonimenis on ihe 
iiood ami had parisot eaeh paper, hut deeidin^z the j»rade was hard. Then, 
since I alwavs kepi idflee hours atUM' seho(»L the rest of the week would be 
tilled by artunuems with students who ihiuij»ht their j^rade was too low, 
Sterne argued, siune blusiereil. some beuyed. and some broke down and 
eried. St)UK- even hrtui^hi in their parents, who were usually eonvineed 
ihai 1 uas prejiulieed a^ainsi their child tor siwne irrelevant reason. If it 
hat! \M been Un' iliose .grades. I wuuUl have fi>und teaehiny a pleasant 
t>eeupation. 

I lien In a luekv ehanee. I Iv^an teaehiiui at the University of C*hiea.uo. 
which lunl an esaininiiiu system somewhat like the describeil above. 
I here the opiniiwi i)f the teacher hatl wo effect whatever on the j^rades ol 
Ins siudcnis. (ii\uic tlepe!\'Icd entirely on si\-}uuir ciimprchensivc exan)- 
inaiions thai uere ijiuMi ai the eml i>f every Lpuo'lcr. Students ctnild rei»is- 
Icr lor ihesc esaniinatii WIS whenever itu\v tUi :vadv ft> lake them, ami ifthe 
Lir.idc tirsi aliauicil was heliw\ what their pride Wiuild accept, they eould 
repeal ihem ai ilie ciul i^feach i|uarier uniil ihev reached a i»ratie that they 
ree»n*<leil .is satistaclorv. To encinira.ue such cth^ris \o inipriwc. we made il 
.1 rtilc ih.u when .i studem repeaicil an exaniinaliiUi. uhiihever ^rade was 
Inuher wouhl suuul in the reciU'd. 



ERIC 



Alituniuh (1)0 sUKtenis lakinii writinu examination wctv aliowcd 
thivc hours iti Intth the nuHMung and attcrnuon sessions of the same da\\ 
we iried to set topics that most students could contplele to their satist'ac- 
lion in about two hours. Wo eiieourajjed them to spend alnnit halt' an hour 
planninji their paper, ait luniv writing it» attd half an hour revising it» Of 
course^ some students would write a complete paper during the lust huu!\ 
tear it up» and then write another complete paper in the second hout\ *rhe 
third luHir was allowed mainly to keep anyone front feeling hurried and to 
proviilc plenty of time for correction and t*evisioiu 

l hesc papers were identitled i>nly by code numbers and wei^e handed 
out in a random oriler to all members of the composition staff for grading. 
1 he nu>rning papers w ere graded independently by two teachers and the 
attermnm papers by two different teachers. Thus two samples of each stu- 
dent's writing were judget! independently by fi)ur different teachers* se- 
leeteil at raiulom. Papers on which the twi) grailes differed by more than 
cmc full gradc-pt^nt wetv referred to a small committee of the most experi* 
enced ami trusted readers* who did not know what grades these papers 
hail received; they knew only that the grades differed. One member of this 
eommittee wiuiUl give each paper a third independent reading* and a clerk 
would substitute this grade U)r w hichever of the two previous grades was 
farther from it. If they were equally distant, he discarded the grade nearest 
the mean* since combining or averaging grades pushes everybody tow*ard 
the middle* and we want \o keep them spread out iis far as pi)ssible. But if 
the first grades weiv B and D and the third was C* he discarded the 
lowest graiic U\ give the student the benetlt of the doubt. 

What was the effect im teaching? After all the years I had spent arguing 
o\er grades witli stuilcins* it was like coming out of a noisy tunnel into 
clear sunlight. I still required a paper a week* but I refused U) grade them. 
WMiat wuuhl he the ptuni? The students knew as well as I that grades iui 
these practice papers would have iu> effect on the official grade, which de- 
piMutcil entirely on the examination. Hence what they valued more highly 
llum grades were tips on what they were doing well or bailly. I did tiot 
mind w riting these ImIs ot advitv or talking them over with students in con- 
ferences on their writing. In thus dealing with about 24 practice papers 
wnilcn as luunework* I could do nearly everything that elementary teach- 
ers try to do with perst>iial grades. I could encourage the faint-hearted, 
challenge the over-confidetit. atul praise everything a student had done 
ihiit was CNcn a little alnwc his usual standard. I believe very strongly that 
noticiuu ami praising whatever a student dtK*s well improves writing more 
ilian an> kind or amoutn of corrcctioti t>fwhat hc(U>es badly, and that it is 
especially important tor the less able writers wlu^ need all the encourage- 
men! thes can get. Aftci noting tour or five things in their papers that I 
loinul interesting a?ul making only o!ie modest suggcstiim for impriwe- 
mem. I thanketl my luck> stars that I ilid uo\ have \i) put ilowii a grailc 



that \vt)uUl niiiko ;i liar mit of inc. Just try wvltinm several favorable eonu 
ttuMiis on a paper atul then jjivittg it a ^jrade of 1). Which will the student 
lK'lie\e? Ami lunv much faith will he have iti your eoniments thereafter? 
AneleiiKMitary teacher miyht j^ive the student an A or a B for tryinj* hard» 
Inn a college teacher can't do ii if the wriliny is below the minunum that 
other C4»llev»e teachers ssill accept. Henee» if we want to use "positive rein- 
forcement" with the students w ho need it inost» we had better rely on com* 
metus and conferences and foryet about grades on the homework papei*s. 
If iliey tlnally pass the examination even with a jjrade of !)• or its mimeii- 
caleviuixalent. we can con}»ratulate them warmly. '*Vou passed! How per- 
fectly splendid! Keep t>n writing as well as you can, but now you can give 
more attention to subjeeis in which you excel/* 

I honestly belic\e thai those who defend the practice of pet*sonal grading 
as a hi>ly cause arc mistaken about its usual effects. To hear them talk, the 
teacher is a nearly perfect beiny who knows all. understands all, forgives 
almost evcrvihini*. and encourages everybody. Bui when I exaitiine whole 
nies o\' paptM's ihat have been marked and commented on by teachers* 
many of them look as though they have been trampled on by eleated boots* 
ami ihc> musi have a shattering effect on a sensitive student. I once wrote 
a whole paragraph on the sins against decency and tact that I had found in 
such Ci>nimcnts* and the result was that most of my Mnglish-ieaehing 
friends would not speak to nic. What I thid it hardest to forgive is misin- 
terpreting whal the stwilen! wrote and then blaming him for something 
that he plainK did not say. 



Kffi'cts of Excvssive Corrt?ction 

I c»in juilge one of the main elTccts of personal grading by ilie altitudes 
ol sludeius who land in my remedial course in college. I hey hate and fear 
wriiinu more than anything else they hav e had to do in school. If they see a 
blank sheet of paper vni which they are expected to write something, they 
look as though they want scream. Apparently they have never wriiien 
anything that anvime ihtuight was mnni. At least, m^ inic ever tahl them 
!h4it anvlhing in their writing was gtnnl. All their teachers looked for were 
nnsi;ikcs. aiul there are st> manv kinds o\' mistakes in writing that their 
students despair of ever learning to aviMd ihcm. 

Ihe attitutle toward writing that these students have deveUiped is well 
ilhisti^iteti b\ a slorv loUl In the Russian writer Chekhov alnnit a kitten 
that was given to his uncle. The uncle wanted make the kitten a cham- 
pion killer of mice, st) white ii was siill very young, he showed it a live 
mouse in a cage. Since the kitten's hunting instinct had not yet developed. 
It esaminetl the mouse curiously but without anv luistility. f'hc uncle 
wanteil to te;ieh it that such Iraterni/ing with the enemy was wrong, so he 



ERIC 



22 



Paul //. DieJirk h 



slapped the kitten, seoldcd it. and sent it away in disgrace. The next day 
the same nunise was shown to the kitten ayain. This time the kitten re- 
.uarded it rather tearfully but without any aggressive intent. Again the 
uncle slapped it» scolded il. and sent it away, l^his treatment went on day 
after day. .After some time, as soon as the kitten saw or smelled that 
mouse, it screamed and tried to climb up the walls. At that point the uncle 
lost patience and gave the kitten away, saying that it was stupid and would 
ne\er learn. Ofcvmrse the kitten had learned perfectly, and had learned 
exactly what it had been taught, but unfortunately not what the uncle in- 
tended \o leach. can sympathize with that kitten/' says Chekhov, '^be- 
cause that sante uncie tried to teach me Latin/' 

H e\erytning written by our less gifted writers gets slapped down for its 
mistakes, and if this treatment continues year after year, can we expect 
that their attitude toward writing will differ from the attitude of the kitten 
t(w\ard that mouse? 1 saw the result year after year in my remedial classes. 
If 1 askeil them to write anything, ihey reacted as though I had asked them 
to w alk a tightrope sixty feet above the ground with no net to catch them if 
they fell. It took some time to build up their confidence, to convince them 
that writing is as simple and natural as talking, and that no reader would 
mind a few mistakes if he got interested in what was being vxkitten about, 
For some time 1 never commented adversely on anything they wrote but 
expressed appreciation of anything I found interesting, no matter how 
baiily it vNas expressed. After students gained eontldenee I continued to ex- 
press appreciation but offered one suggestion for improvement at the end 
ot each paper. If \mw writers learn one thing about writing per paper, that 
is far aboNc the average. 

Alhnv me tt) insert two bits t)f advice about revision. Like most Hnglish 
teachers. 1 belicNe that rewriting an unsatisfactory paper teaches one as 
much about writing as s\riting a new paper, but most students hate it. 
riu'v tnight to gel stnne sov\ t^f reward. The most effective reward 1 have 
lound is to distribute a list of topics that I expect to assign during a 
i|uartcr or semester, with certain topics starred. Then 1 tell my classes, 
shall expect all oi \oi\ to write papers on the starred topics because they 
lake lip tlifferent t>pes tif writing, different rhetorical principles, and ihe 
like. Hul on all the other Ui|)ies you have a choice, ^'ou may write either a 
new paper on that or rewrite a paper on an earlier topic if you were 
not satistlctl with it and have since thtuighl oi a better way to treat that 
topic. It MUX chotise to rewrite. 1 shall want to see both the original and the 
rcw ritten Nersions." 

M\ sectMut hit of atlvico is ttuluplicate ct^pies tif one paper mi each as* 
signment with wide spaces between lines and ample margins. Students 
su\i\\ iliese papers as htnuework. gr:ule them, and insert ctirrcctiotis and 
ci»mments. includin.u huuiaiory comments on anytliing they think was well 
dtme. In class the next ilay. we gti thnnigh the paper paragraph by para- 



liraph. comincntini^ on ovcrvthiiiy that was ijood or bad about it, ami sug- 
jicsiiny iinpnncinciils. 1 have tound this practice far superior to ''buddy 
editing" in whicli pairs of students exchange papers and try to improve 
them. Wlien only one other student sees a paper, he can usually find only 
three or four things wrong with it; but when the whole class gets copies of 
the same paper and has time to mark it up with corrections and sugges- 
litMis before it is discussed^ one student or another will notice everything 
that the teacher notices. I his is the only situation in which I allow a paper 
to be ripped (o shreds. I either ask the writer's permission to exhibit his 
writing (without ideniiflcaiion) or» preferably^ use a paper from a previous 
class. When students iU> the ripping, they enjoy it and probably learn more 
about revision tlian from rewriting one of their own papers^ since their 
author\ pridu is not involved. 

Whenever I suggest this practice, some teachers say» "1 do the same 
thing, onlv 1 use a projccti^r." I'm sorry, but that is ntft the same thing. 
Students cannot take the projection home to edit before the discussion: 
one cannot project the whole paper at once; and half the time the projec- 
tion is unreadable. For this task, duplicated copies of the w hole paper are 
indispensable. 

In this section I have talked about the effects of bias in grading test 
essays and how to eliminate it. In so doing I have had to counter the argu- 
ments of teachers who believe that it is almost immoral to grade any paper 
without full knowledge of the student — his ability^ background* and cir- 
cumstances— St) that one can adjust the grade to reasonable expectations. 
Such teachers think t>f grades as tokens of praise or blame* and that view 
may be all right up ituhe end of grade b; at any rate, it is almost universal- 
1\ held b\ everyone connected with elementary schools. Above that point* 
however, both students and teachers ct>nie to look upon the results of im- 
portant cxaminaiiiMis as inhimmtioH — information that is valuable only to 
the extent that it is true. I have argued (hat praise and blame enter later — 
the piuM* writer who passes* the average writer whi> gets a C\ and the bril- 
liant writer who gets an A are equallv entitled to c<Migratulaiions, I have 
als o argued that impersonal gratling oi unidentified papers by all mem- 
bers ot a com|>osiiion staff brings about better relations between students 
ami teachers than the personal grading of elementary sclun^ls. In the staff 
grading svsiem, the teacher is the siudeni\ friend and guide* never his 
taskmaster and judge. He wmild be delighted if every tme of his students 
made A s, hut he can't just give them A\; they have to earn their marks by 
the impressitm their writing makes on all other members of the depart- 
ment. At the lower end of the scale, if i^ne o\ his students fails* or makes a 
lower mark ilian his pride will accept, the teacher feels it just as keenly as 
the student and does everything he can to help the student earn a satisfac- 
tory grade wlien he repeats the examination. 

I hai possibility t^f repeating the examination it'rhe grade first attained 



ERLC 



24 Paulli, Dii'derUh 



is iinsatistaotory. with tho undorstandinjj that whichever i^rude is higher 
will stand in the record, does more than anything else to take the curse off 
the system. In seccmdary schools we have to offer a make-up examination 
in an> case for students who were absent. If it is scheduled a week or so 
after the regular examination, and if students who were disappointed in 
their grades are allow ed to take it. it w ill have the effect of reducing fear of 
the examinatum and t)ffering a second chance to students who had an "off 
day." 




Standard Scores for Test Essays 



In staff grading of test essays^ each reader gets a large random sample 
of papers from a large, heterogeneous student population in which it is 
reasonable to assume that writing ability is normally distributed. This 
means that each reader should expect to find small numbers of very good 
and very poor papers, larger numbers of good and poor, and a still larger 
number of a^.erage papers. As the number of papers graded by perfect 
ju<}ges approaches infinity* the distribution of their grades will come closer 
and closer to the ''normal curve" that is crudely represented in the 
following diagram. 




Standard deviation -2 -1 Mean +1 +2 

Percent ile 2 16 50 84 98 

Standard score 10 20 30 40 50 

Range of scores 1 14 15-24 25 34 35 44 45 59 

Letter grade K D C B A 



Meusurmff Growth m English 25 



In this diayrani the distance fnmi left to right represents the quality ot 
the papers— tViMU very poor ti^ excellent— and the height of the curve 
above the base line represents the proportion of papers that wc should ex- 
pect to find at any given point on the scale of quality. 

Of course, in testing any limited number of students* such as a thou- 
sand* there will be departures from this curve for two reasons: this sample 
may happen to include more students than usual at some points on this 
scale* or our imperfect measures may yield more scores at some points 
than wmild be fiumd by perfect measures of the same characteristics. 
Since judging essays is a chancy business at best* the latter cause is more 
likely to affect tfie distribution of grades than the former. Hence* if wc ob- 
serve the propi^rtions predicted by tne normal curve in grading large num- 
bers t>f test essays, we arc likely to come closer to the truth than if we rely 
entirely on intuitive judgments. 

The diagram i^f the ni^rnial curve has been divided into five intervals 
corresptuiding \o letter grades of E* D* C* B* and A. The proportions of 
lest essays in these intervals are 5* 20* vSO* 20, and 5 percent. These differ 
from the j)n^pt)'-tions traditionally expected but seldom achieved in the 
United .States— 10, 20, 40. 20* and 10 percent— but they are common in 
New Zealand. 1 have come ti^ accept the smaller prt^portions of top and 
bottom grades and larger pn^portiim of middle grades for three reasons. 

First* in staff grading i^f test essays, I have found teachers extremely re- 
luctant 10 give as many as 10 percent i^f the papers either top or bottc^m 
grades, but ihcy willingly settle for :i percent. In spite of directions to the 
contrary* their middle grades alw ays come closer to 50 percent than to 40 
percent. 

SectMid. differences in the quality of papers near the middle of the dis- 
tribution are hardly pereeptiblc. The ckwer ti^ the mean one sets bound- 
aries for the grade o\X\ the more differences cuie finds between the grades 
of pairs t>f readers, reaehers grade more confidently* cheerfully, and re- 
liably if t)nc sets these boundaries around the middle half of the papers. 
Then tlicy want \o indicate differences between papers that they regard as 
higfi C or C\ comnuMily expressed as C* ^ and It is advantageous to 
have such distinctions in this large middle group, but they can be indi- 
cated more precisely by the numerical scores below the diagram* which 
uill presently be explained* 

I lnrd. the proptM'tit^tis for the five grades that I now favi^r have a unique 
ad\ aniage: average papers in each of these intervals lie almost exaetlv one 
"stand.mi (le\iaMtnr' apart. Mi^re j)reeisely, the middle paper in the B and 
I) intvM'vals li\'s I.OTstamlard deviatit^ns from the mean: the middle paper 
ill the A and {• intervals lies 2.()b standard deviations from the mean. The 
middle ( . t>f eoursc. stands exactly at the mean. These very slight depar- 
I., res exaef siatuiard deviations could never be detecied by even a 

ERLC 



2b Paul II Diederich 



skillal reader, ami they would nc\cr niako a ditTorence in our judgment of 
a student or in his plaeeinent and prospeets in school. It is impossible \o 
come eli>ser \o exact standard deviations than this without resi^rting to 
proportions that teachers would be unable to remember or compute. But 
luMhing cinild be simpler than first sorting the papers into three piles — top 
quarter, miiUUe half* and bottom quarter; then, on a second reading, pick- 
ing out a fifth ofthe t(^p papers for a grade of A. and a fifth of the bottom 
papers for a grade of H. or the numerical equivalents of these grades. 

Teachers who know some statistics often tell me that I should set the 
houmlaries ot the C interval halt'a standard deviation above and below the 
mean along the base line* and those beyond the B and D intervals 1,5 stan- 
dard deviations from the mean. Knowing how teachers grade papers, I am 
iiuire anxious to have the middle paper in each interval anchored to the 
standard deviation tnan to have the boundaries set at mid-points between 
siaiidard deviations. In the divisions I have chosen, the average distance 
tnMU tlie mean i.f all papers in the B and D inter\-als is one standard devia- 
tion: of all papers in the A and H intervals, two standard deviations. 

1 Standard Deviation 

Now it is time to explain what the standard deviation is and why it is 
useful. It is an average i^f the distances (deviations) of all scores or ratings 
f rom the mearu but a special kind i^f average. In the usual kind of average* 
yon would add all the distances from the mean, disregarding whether they 
were plus or minus, and divide by tlie number of distances to get the aver- 
age distance. But iti this special kind of av^age* you first square each dis- 
tance from the mean* add all these squares, divide by the number i^f 
squares it) gel the average squurvd distance from the mean* and then take 
the square rtnu. .At first ytui may think thai this gets you right back to the 
a\crage tlistance from the tnean, but it does not. It gives greater weight to 
scores or ratings that are farther from the mean. Hence it yields a rumber 
that is larger ilian the average distance from the mean* and this number is 
caiictl the standard tieviation. 

This computaMiMi takes a h^t tif time, and for most purposes it is unnec- 
cssar\, A \er\ close appnvximation t^t the standard deviation of sciws tm 
nbjeciive tests (assuming that all are positive numbers) is given by tlie 
formula: 

. . _ 1,8 1 sum of hieh fifth of scoro.s minus sum of low fifth) 

Man arddt'viation = - — — • • 

Number of student.s 

l his compulation is easier than finding the average score — the mean — be- 
cau>e Mui do m^l even need to atld all the sciMVs: only the ttip and bottom 



Mvaaurin^ Grow th in Enalish 27 



tifib (nnindcd to the nearest wfmlc number). Ydu subtraet the low tltth 
troni the hiyh fif th, nuihiply by 1.8, and divide by the number of students 
tiVtiet the standard deviation. In a eomparisoti of several short-cut formu- 
las tor the standard deviation Uouniul ij^t h\iucaiioNai Measurvmau, Win- 
ter l^^'l). this one proved most aeeuratc. 
Wnx have already seen another way to approximate the standard devia- 
in the ease of j^rading essays. If larye numbers of test essays aa' sorted 
into five piles in order of merit in the proportions of 5, 20, 50, 2{\ and 5 
percent, the averai^e distance from the mean of the B and D piles will be 
tnie standard deviation; of the A and H piles, two standard deviations. 
Hence you may say that the middle papers in these piles lie one standard 
deviation apart. 

Translating These Letter Grades into Numbers 

Since the yrades on test essays will have to be added, averaged, com- 
bined with tibjeetive test sciM'es. and subiceted to other computations in 
what follows, it is tieeessary at some point translate them into numbers. 
When schools and colleiics compute grade-point averages, they most often 
use nundiers fnmi 0 (V\) to 4 (A), with tenths representing positions be- 
tween and beyond these whole numbers. ^ used to prefer numbers from 1 
Hi) to 5 (A), alst) with tenths, for two reasons. First, it is unnecessarily 
insulting to award a student a grade of 0. .Second, when large numbers of 
students are tested, a few of their sevjres will extend as far as tha*e stan- 
dard deviations alnne and beUnv the mean* but only three students in a 
thousand will seore abtne or beUn\ three standard deviations if the distri- 
bution is ntirnial. it is possible \o indicate these extremes by .1 for the low- 
est score and for the highest if you use luimbers from 1 to 5. but there- 
is nn wa\ to indicate positions lower than standard deviations below 
the mean it vtni use 0 u\ A. Incidentally. 0 is a handy symbol for 'Mm dat.i'^ 
the student was absent, was ill nuio himself justiee. misunderstood the 
qiiestiiHi st> badly that his paper could not fairly be compared with the 
others, or was suspceted t^f cheating. Such zeros should lun be averaged 
wuh other urades; they shtuild be omitted until the student takes the 
make-up examinatiiMi. which will suppl> the missing grade. 

After using the scale cd* 1 \o 5 (with tenths) \ov several years. I ftnmd that 
man\ teachers were having irtnible w-ith tiecimal points in ctMuplex com- 
puiaiiivis and regardoti them as a nuisance. Studeius atui their parents 
alsii rei:artied tenths as u illitig anunnUs and ctnnplained bitterly if the> 
missed a higiier urade bv what tlioy called "r.ne Kuisy tenth o\ w point." It 
(lid alter the true siiuatiim in any way. but it nuule ctnnputations easier 
.nid c\cr\one happier call the midpoints o\ the fi\e intenals 10. 20. M). 



ERLC 



28 Fuul «. DMvrich 



40. and 50 t'ruin Un\ to hijih. Wc arc free to call ihcni whatever wc like; many 
publishers call them 30»40, 50, and ^(X The second digit is understood 
to reter to tetnhs ofthe standard deviation. The range of scores equivalent 
to each letter litadc is then M4 for K, 15-24 for D» 25-34 for C\ 35-44 for 

and 45-5^) for A, as shown below the diagram. Ranges for A and H are 
slightly extended to get out to three standard deviations above and below 
the mean, but very few students will ever be found at these extremes. 

Since there are now ten points between the midpoints of grade intervals* 
teachers si)on begin using tliese points to indicate their judgments of test 
essays more precisely. For example* if a paper is just a shade above a 
straight C\ they may give it a M\ if it is almost on the borderline between C 
and B. they may give it a 34. I have not Umnd it necessary to set quotas for 
the number of papers that may be placed at these intermediate points* 
since repeated combinations with grades of other readers* grades on other 
test essays, and scores on objective tests bring the final distribution of 
standard scores close enough to the normal curve for practical purposes. 
In any case, the second digit does not mean very much* since the "stan- 
dard error" of such ratings (with the reliabilities usually attained) is 
roughly 5 points on this scale. This means that if the same essays were 
graded repeatedly in exactly the same way* and we kept averaging the rat- 
ings until wc were sure what the true rat'ng was* about two-thirds of these 
ratings wouhi lie within 5 points of the true rating* but 5 percent of them 
would be nuirc than 10 points off. Hence all that the second digit can tell 
us— after all the Combining that an examination permits — is whether the 
final sci»re is closer to B than C closer to C than B* and so on for the other 
intervals. 

StMuc teachers speak with scorn o\ ''grading on a curve.*' but they are 
thinking t^f single classes o\ twenty \o thirty students, graded by their own 
teachers. livcrytMie knows that st)mc classes of this sort are brighter* better 
prepared, and nu^rc highly motivated than other classes. Perhaps 50 per- 
cent tifsuch stiulcnts t)ught \o get A\. 40 percent B's. and 10 percent C*s. 
In the staf f gradinu situatitMi* in w hich we are typically dealing with some- 
thing like 1,(KK) students, graded by eight different teachers, those are 
probablv the grades that the best classes w ill get* since their papers will be 
ci»mparcd with tht)se fmm much less gifted and industrious classes. With 
a mimbcr as large as l.(XK) — usually the 'otal population (d a school* or of 
three grades— it is reasonable tt) assume a imrnial distribution of writing 
ability, and gratics may be distributed in accordance with that assump- 
tu>n. But if all the best wriicrs have been placed in one class of thirty stu- 
dents, ami tlieir papers are mixed in with the tither ^'^O and gradctl with- 
(Uii iticntitlcatitm. ncarlv all of tliem slunild get either A\ or B's- -barring 
orrt>rs ot ludgmcnt— and most t)f these shouUi becauglu bv the machinery 
of diuible gradinu antl rosiew of discrepant grades, as pa'viously ex- 
l^laincd. 



Measuring Growth in English 29 



111 a lartjLM* perspective, siipliistieaied u.w nt'the normal eurve is the best 
liuide 1 know to the proportions of the various yrades that ditTcreni elasses 
should beoxpeeted to achieve. AlthoU{»h there arc conipUcatiuns that are 
loo technical to explain, and professional judgment may modity the result, 
the general idea may be conveyed by tlie following example. It is well 
kno\Mi to testmakers that the best predictor of general verbal ability is 
usually a standardized test of reading comprehension plus vocabulary, 
taken routinely by all students in most schools. Suppose the distribution of 
scores on this test in your school looks like this: 

Keadinj^ -f Voca bulary Scor es of AU Students in This Grade 

Lowest 5% Next 20% Middle 50% Next 20% Highest 5% 

^ ^ ( K I (n)_ (C) (B) (A) 

' 0 13 14-24 25-36 37-47 48-60 

What percent of your students stood within these ranges of scores? 
0% 10% 50% 25% 15% 

If your students are indeed as superior to the general run of students in 
their grade as these reading and vocabulary scores indicate* and if they 
work up to their ability* then— on tests that are closely related to verbal 
ability— nt) tnie shouki be expected to falK only UV'n should be expected to 
get ^(V^, 25"i. B\. and IS'*o A\. Such figures, of course, should be 
taken as t>nly a rt>ugh guide tt> what you should expect* since no short stan- 
dardized test for a small number td students is a good enough predictor to 
trust \er> far. Still, if you gave 25 percent of these students failing grades* 
vour principal would be justified in raising questions. 



Prntossnr F:duard CJonlon of Yale tells about an examination he onee 
conducted tor the C*olle^e Boanl. He explained and illustratal the seale of 
fi\e points that was to be useti anil hail the readers practice using it by grad- 
ing copies ot a set of sample papers. 

When the actual grading began, he noticed that one military-looking gen- 
tienian — an instructor from West Point — uas obviously not using the scale. 
His grades were all two-digit numbers: 53. H3. and so on. 

'*Ho\\ ilo vou get these numbers?** asked Dr. Gordon. 

"Well. Dr. Oordon/* replied the military gentleman, "rm too oki a dog to 
learn new tricks like that new-tangled seale you wanted us to use. So 1 iust 
went back lo nn usual way ot grading papers, knowing that youVe smart 
enough lo translate nu grades into an\ scale you please. ! just count the 
number ot mistakes ami subtract that number fronj 1(X) percent." 

•*Biil whal do vou call a mistake?" asked Dr. (Jordon. 

Ihe mans asionislinient was obvious. "Wh\ surelv. Dr. Cjordt>n. vi>u 
know wliat a nustakr is!" 



ERIC 



30 Paul II Diederich 



Setting Grade* Lines in Accordance with Teacliers* Predictions 

Althoujih stamJani scores tor lest essays are nothing more than a trans- 
lation 4)t' letter jirades into nunierieal equivalents, tliere may be no imme- 
diate prospect of getting your school or department to adopt them. Let us 
sec. then, how to get nearly the satiic results with letter grades, using as 
predictors the poi>led judgment ot* several teachers as to the number of stu- 
dents in each of their classes who are likely to make each grade on the 
examination. .Suppose their predictions turn out as follows: 



(llass 


K 


D 


C 


B 


A 


Total 




0 


0 


8 


10 


7 


25 


2 


0 


1 


7 


9 


8 


25 


3 


0 


1 


10 


9 


5 


25 


4 


1 


2 


10 


7 


5 


25 


5 


2 


4 


11 


6 


2 


25 


B 


2 


5 


10 


5 


3 


25 


7 


3 


9 


12 


1 


0 


25 


8 


_2_ 


8 


12 


3 


0 




Totals 


10 


30 


80 


50 


30 


200 


I'eroent 


5 


15 


40 


25 


15 


100 



I hese tt>tals and percents are neater than one *vould ilnd in actual pre- 
dictions. I hev are intended only to illustrate the point that there is nothing 
wrtuii: about asking teachers to aim at a distribution of grades in which 
there are far more A\ and B\ than D\ and I-/s \\\ in their judgment, the 
students taking this examinatit)n are brighter and better prepared than 
the general run of students in their school. Such deviations from the nor- 
mal curve are t)ften rectnnmended by directors of testing. 

As these teachers grade the test essays, they should expect to place 
ahtiin 1 5 percent of the papers they receive in their A pile, ahoia 25 percent 
in their B pile, and st)t)n. If they deviate from these predictions \v more than 
> percent, they sht)uld expect some heated arguments from their eoN 
leagues before the grades are turned in. For example, if one teacher fails 
1.^ percent t)f the papers he grades, he should be prepared to explain why, 
because the t)thers think that nt)t nn)re than 5 percent of this group should 
tail. 1 hese predictions are based on a great ileal of prior experience with 
these students and shouki not be disregarded. On the other hand, predic- 
tions shouKi not he hallowed slavishly because, in a group as small as 200. 
most ot the papers th I deserve failing i^rades might fall into the hands of 
nne reader. It the seeotid reader of these papers agreed, then readers of the 
other papers should find less than the predicted 5 percent of failures. 

Without such guidelines, there is no way to tell whether the grades 



Measuring Growth in English 31 



turned in by the tvadcrs arc in line with reasonable expectations. With 
thcnu each reader will know when he is^straying very far from the stan- 
dards and expectations of liis colleagues, and this may cause him to recon- 
sider some of the grades he has assigned. If he still thinks they are correct, 
he will probably formulate his reasons carefully, because he knows that 
they will bechallengal* When teachers formulate reasons carefully for the 
sake of (a) explaining grades that are out of line and (b) combatting argu- 
ments with colleagues, such teachers are also gradually bringing about 
closer agreement on grading standards. Such agreement reduces the un- 
fairness to students that often results from insufticient thought and care in 
grading. Over a period of time, it also makes all members of a department 
more \i\idly aware of what tlicy are trying to teach. 

Remember that the predictions indicated only how many students were 
likelv to make each grade, not which students. C onsequently* even if the 
tinal distributitni comes out exactly as predicted* there will be many sur- 
prises when the teachers find out which students received these grades. 
Stnne that thev thought were sure to get A\s will get B's, and some that 
they though! woukl fail will pass. 

Although these surprises cause some dismay and argument before the 
grades are recorded, it is unfair to change the grades of particular stu- 
dents—once their identities are known — simply because their teacher 
thinks they deserve a liigher or lower grade. Such changes would reinstate 
all the forces of bias, prejudice, favoritisnu and idiosyncratic judgments 
that the staff grading procedure w as designed to avoid. They add an un- 
kntnvn alhnvance for hard wurk, compliance with requirements, attention 
in class, sympathy with the student's misfortunes, etc. to the original 
meaning: a simple measure of competence in Hnglish. Such changes also 
cause ill.* staff an endless anunmt of trouble. It is almost impossible to 
keep it a secret that stmie grades w-ere changed at the insistence of a teach- 
er. Then, as soon as C'arhis learns that Hmile\ grade has been changed* lie 
comes to Ids teacher with tears in liis eyes and begs him to insist that his 
paper be reconsidered alsti. Soon almost all students except those who 
were agreeably surprised by their grades will besiege their teachers w-ith re^ 
quests to ha\e their papers reviewed. Unless all the teachers hold the posi- 
litMi that the way to change a grade is to take the make-up examina- 
tion. I hey will probably revert to personal grading the following year. 

l-\cn though they do htild the line* it t^ften happens that teachers feel 
sonic injustice was done \o their students. Tliey may win support for 
woi^hlinu teacher cvaluaiitnis in the final grade decision. *!*he weight most 
often adopictl is halt fm* the ctiursc grade, determined by each teacher, 
and halt ti>r the examination grade, determined by the staff grading pmce- 
dure plus scores on the objective sections. Incidentally, if the staff wishes 



ERIC 



J2 Paul 0. OiederUh 



to ha\ 0 the course yi ade et)unt as nnieh as the examination* it is wise to in- 
sist that etuirse grades be turned in betbiv examination grades arc report- 
ed. Many teachers are so uncertain about their judyments that it' they 
think a student should yet a B but the examination says C» they change 
their ininds and put dovv n a C. I recall two colleges, one ot which secured 
course grades in t'reshman composition before the final examination* the 
other after the examination gr^tdes were reported to teachers. In the for- 
mer, the correlation between the two grades was usually about »()0; in the 
latter^ about .HO. 




Computing the Reliability 
of Essay Grades 



lo find out whether the reliability of grades on test essays in your de- 
partment stands in need t^f improvement* and whether the procedures I 
ha\c recommended t^r any other procedures bring about improvement* 
vou need a way t^f computing reliability that is easy to understand and 
lakes vers little time. Hnglish teachers are often allergic to computation 
and have neither time nor interest enough lo learn the complex method of 
coinjniting reliability that is explained in books of statistics. 

fortunately there is a quick and easy way to do it that I call 'Mop-quar- 
ter tcirachorics." It applies u^ any set of papers that has been graded in- 
dcpcmicntlv by two readers. For this purpose both must indicate which 
papers thcv wouhl place in the top quarter in general merit. 'I'his must be 
prociscK the top quarter, rounded to the neanjst w hole number. For exam- 
ple, it there are 215 papers. readers must indicate whieli 54 papers 
thcv regard as the best. This causes no extra trouble because, in the grad- 
int: procedure I have rectnnmended. one starts by sorting the papers into 
three piles: top quarter, mititlle half, and bottom quarter. Since the papers 
arc usualh identified only by code numbers, the first reader arranges the 
loji quarter in numerical order and sends a list ot their numbers to the per- 
son wlio will compute tfie tetrachorie. When the second reader gets this 
hatch n| papers (rearranged in their original random t)rder). he or she dt)es 
the same. The person in charge has a list t^f all 215 numbers in numerical 
order, and puts a check after each number that the first reader put into 
the toj> quarter, then a check after each number that the second reader 



\m into ilic lup quativf. I Ik* \wsm in vham counts lum many numbiMS 
ha\L' ihwks: tiiai is. lum many papers were plavcil in ihv lop quartci* 
by both riNulcrs. 

lit UN suppose that twenty six numbers appeared in biuh of their lists. 
To ehauije the number the perecm o\ liiis ){nnip that biub leaders 
phuvil in their top quarter. disMe 2u by 215. the total number of students. 
Here it is 1 2 pereent. correspinidiitji tin the lollosvinn table) to a tetraelun ie 
eorrelaiion o\ This is ao estimate of the aniotmt t>f amcement between 
ihe luu readers, uhieh is rv.uardeil as the reliability k\\ um* nuint}. Sinee we 
intend to use the suiti or average of both ratings as the j{rad<? m each 
paper, that reliability must be *Nteppe<l up** i>y the Spearman-Hrown 
JVt^pheey I'ormida: tuiee the eorrelaiiiUHbetween the Wo raiinys) divided 
bv one phis that correlation. That brings the reliability of this set of essay 
.urades up to .tr, as shown bekm in the third line of this table: 

IVreiMit in top quariiTiif huth 
tWo.« 07% 08% 0H% 10% 11% 12% i;j% 1.^'.) 15% in% 17% 18% I90/0 20% 

Ti'truehorie eorrt»latian 
M .0? .17 .Llti M .42 .50 1,57 M\ .70 .75 M ,Hti .m M 

Ueliabilit y of sum or avera^o 
•»» .41 .51 .50 .t}7l.7:] .78 .H2 .80 .8ti M^d M\ M 

Thesianilard but itioie diltieult uay of computing correlations bet^veen 
two seis t^r essay tirades ov ^Mher measures is called "produet-momeni" 
correlation. Houylily spvakitiji. teirachnric correlations meari the same 
thiniLi as imnluct-iiumicni c4)rrelatitnis, but they are less precise and more 
subject lo chance variation. In ti^chnieal term^. the standard error of a 
ictrachtM i * is approximately twice as lar>»e as that of a product -moment 
correlalitMi \ov croups ot the same si/e. 

Still. teiraciuM-ics are better ilum notliinij. and if lhe> arc Cdinpuied rou- 
linelv in all essa> lesiiu)* tiperatituis—between pairs ol readers, between 
njorninu aiul afternoon essa>s. ami the like— they will tell you wheiher the 
relialMlitx of essa> .urailes in your deparinieni is improvifiy. and whether it 
has reached a level tliai is adequate lor practical decisions in the onlinary 
course o| scluuil wtirk. 

Over Ihe vi-ars I have come u^ accept a reliabiliiy o\' .80 as adequalt: fur 
iliai piirptisc. especially when ihe examinatum includes objcciive seciiotis 
thai vield tar hii^her reliabilities than essays per unit ofiesiiuM lime. But 
the rcliabilii) ot the essay grades wect^mpuied as an example tm paj»c .U 
was otdv .tr. That is far ivom saiisfaclorv. Inii it is ivpical. {{ven after 
workinj^ vviih an bnulish siatf t^r some lime. 1 have rarely been able \o 
bnosi ilie avcraiic cnrrclaiitwi between pairs of reailers atnne .50. and other 
examuicis lell nic thai ihis is atnnil what ihey j»el. 

O 

ERLC 



l\>rtuiKiiolv tlH;t;o is anuihor Umw of Jhc S}>carmun»Urovvn PriH>lwcy 
I'orimila that tolls hinv many tintos tnercaso the leuyth ot'a test— a nuitw 
her usually ivprescntal by A- -to attain any desired reliability. 

1^ »tlu» roliability you want) tinu»sU- tho ri»liability you Rot) 
liho iH»liabiliiy you koU timi»s U - i\w ri»liabllity you want) 
Sitiei^ wo want .80 and got .67, this bttcomwj 

True, ihc tVacfion does tiot e.xaetly equal 2. but that is due to^'roundinj; 
error.** The .W^ and obviously represent two-thirds and i>ne-thii*d. It' we 
substituted tVuetions for deeinials. the numerator would be 4/5 x I /.I 
4 1.^. Tlie dentmiinator would be 2 3 x \ > - 2' 15. Sinee 4/15 is exactly 
twice as larye as 2 15. i>ur eonelusion is sustained. 

So we have U) double the lenyth of our test in order lo attaiti a reliability 
of .HO. What does this mean? In i4\ieetive tests, exaetly what it says: you 
make up iwiee as matiy items of the same kind. But in the speeial ease of 
essay tests, there are three possible interpretations, one of whieh is wrong, 
another eorreet hut not feasible, ami a third that is both feasible and more 
informaiive. If we simply doubled the time allowed for the essay but still 
j^ot only o\w jirade iMi it from eaeh reader, it would have little, if any. effect 
on reliability. If we had eaeh essay read by four instead of two readers, it 
would imleed increase the reliability of grades on this particular essay to 
.SO. but it is hard enough to get two indejKMident ratings of each essay, and 
few schools ctuild afford the time or exjKMise of giving eaeh essay four in- 
dependent ratings. Besides, the result would not indicate how consistent 
the siudenis are in the quality t)f their writing from one topic to another, or 
from one lime another. The most fruitful interpretation, therefore, is 
*'I)t> the same thing iwer again." Have the students write a second essay on 
a tiifferent topic, but i^ne that requires the same nuuie of writing and is 
equally familiar U) all students, and have a different pair of readers rate 
this essa> indepeiulently. In my experience, having students write two 
short essays in the same sessitm oi an examination does not eotistitute 
ucmiinoly indepemient satnples i^i their writing, i'hey rarely differ more 
than the first and secotid pages of tlie same essay. I'here must be simic 
separation in time as well as iti tt)pie before one can judge the average 
quality of a student\ writing tni liifferent iK'casions. It has bee vexperi- 
eiK'c of many examiners in different colleges that the sluM'test j. >ssible sep- 
aration iti time tor this purp^^c is have one essay written in the nu)r!iing 
and the t)ther in the afteriun)n oi tlie same day. and the examination 
schedule of most ct^lloges tioes not permit any longer separation in time 
than this. That is why so many examining hoards have adt^pted this policy 



if tlicv intctui to altadi any mil wviyht U) the essay jjradcs. orcoursc, you 
will tiiul exaniinatiotis that ivquiiv just otio short essay, hut in such cases 
the cxatiuners tvly on the objective sections to cany practically the whole 
hurden o\ reliability. 

If* vou have a director of testing?, one procedure that I have recom- 
mended may worry him when the time conies ti) compute tetrachoric coi*- 
relations. I said that whenever two j^rades on a test essay differ by more 
than one full .uradc-point (or more than 10 points if you use standard 
sc(>resK refer the paper to the most experienced reader who has not already 
graded it for a third independent ratini*, A clerk w ill substitute this grade 
for the previous )»rade farthest from it (sec payc 20), Such revisions of dis- 
crepant ii»'ades necessarily increase correlations above the level that your 
director of tcstinj* expects w hen he correlates uncorrected yrades, and he 
may cry "h'oull** Renicmbcr that correlations tell you how closely two sets 
of measures a^uree, and if vou take all pairs of grades that disagree sharply 
ami substitute a tliird grade that is closer to one or another, you automati- 
cally increase the corrclatiotu But what else can you do? It would be stupid 
to correlate just the original grades, because the grades you discard have 
no effect on students* grades. What you want to compute is the reliability 
of students' grades, and for that purpi)sc yo;; have to correlate the two 
ratings that actually determine the grade. In any case, your director of 
testing or statistical consultant has little cause for complaint. He is used to 
getting correlations of .30 to .40 between sets of uncorrected grades, and 
they make the reliability so low that the essay grades arc practically 
meaningless. If you discard about 10 percent of extremely aberrant grades 
and substitute genuinely independent grades of a more experienced 
reader, ym will probably get tetrachorics in the ncighbi)rhood of .50, and 
they bring the reliability of students' grades on one essay up to .67. Then, 
as we have seen, all you have to do is to secure a second essay, graded in 
the same way, to attain a reliability of .80. 

The reliability of ^'n/(//;/j(,' is one thing, however: it shows how closely 
tour readers agree in judging the merits of two essays. What it leaves out is 
the reliability of the stuJcnts. fo what extent do they tend to write as well 
on one topic as or another, and on different occasions? Ihe only way to 
tuul out is to ciu'rclatc the sum or average of their grades on the first essay 
with the sum or avcra.uc of their grades on the second. This is computed in 
the same wa> as the reliability of the grading (as explained on page ^^): 
tnui the percent who stood in the top quarter of final grades on both es- 
savs. look down to the corrcsptuiding tetrachoric. and below that to the re- 
liabilit\. In ni\ experience, if the average reliability of the grading is «0. 
llic stepped aip correlation between final grades on the tWd essays is likely 
to be ahou, 1). That is ihe ovcr-all reliability of the essay part of the 
cvaminaiion. induditig both the variatiiui in readers' judgments and the 
NariaiivMi in qualii> of writing fn^m one topic to another. 



ERLC 



3ft Paul tl Diviivrkh 



AltluHij^h that final reliability of .70 is lower than I like, I do not know- 
any examiner who consistently does better than this in any sort of essay 
examination that is adntinisttuitively feasible— unless he adopts rules that 
artitleially constrain the grading. Of course, in essay tests designed to 
measure information and understanding, as in history, one ean do better, 
but not much better in tests designed to measure writing ability. If you 
need a reliability of Mi) or better to determine the outcome of a controlled 
experiment, you wi!! have to get eight or more test essays. Otherwise, that 
final reliability of JO on the essay part of the examination ean be offset by 
the higher reliability of the objective sections, as I shall now explain. 



Computing the Reliability 
of Objective Tests 

The last sectit^n should lodge forever in your memory the basic meaning 
of test reliability: the amount t>f agreement between two sets of indepen- 
dent measures of the same characteristic, taken at about the same time, 
Vou have nu)rc et^ntldence in a test if you measure the same thing twice 
and uet approximately the same result both times. 

It was easy \o see to do this in the case of essays: correlate two sets 
of iiulepeiuient ratings t)f the same essays, or correlate grades on one essay 
with grades on ant)ther essay written by the same students. 

But lum dt) you do it in (he ease of objective tests: for example, a vo- 
cabulary test o\ sixty items? I'here you have only one measure — a single 
score tor each student. How Ao you know how close you would come to get- 
ting the same scores {ov these students if you gave them another test of the 
same kind? I here is not eiu>ugh time in ordinary school testing to admin- 
ister t\u) ct)mparable h)rms t)f every test. 

Let me explain \\o\\ pn^fessit^nal testmakers do it in constructing such a 
vocabulary test, \\o\ because sow want to learn how to eonstruet vocabulary 
tests but beeanse the labt)rit)us pn^cedures they employ are the basis for 
the quick and easy tbrmula t\)r objective test reliability that I shall pres- 
entlv explain, and they will help \o\\ understand what it means. 

I he \t)eabulary testmaker usually wants \o pnuiuce \\\o comparable 
tortus so that you can use one before instructitMi and twie after, ov t)ne in 




Mvuxurifif} Grawih in English 37 



the regular examination and one in the make-up examination. Since he 
knows that many of the items he writes will be discarded after tryout be- 
cause they are too hard* too easy, have either two right answers or no right 
answer, or have some other detect, he writes perhaps 200 items like the fol- 
low ing: 

explicit: A. go iiff with a loud noise C. run away and hide 

B. make use of for one's own benefit D, throw away 

liach tryout form has 1(X) items of this sort, which nearly all students 
can finish in .15 minutes or less. A good trick to remember in trying out a 
new test is to arrange the forms in each package in what testmakers call a 
**spirar* tnder so that the first student in each tryout class will get Form A, 
the next Form B. and so on. Thus both forms are administered simulta- 
neously in each tryout class* but each student takes only one form. If there 
are as many as eight tryout classes (and there are usually more than this), 
one can be pretty sure that the average ability of students taking Form A is 
equal to the average ability of students taking Form B. since a random 
half of the students in each class took each form. That would not be the 
case if four classes took Form A and another four Form B. 

From the results of the tryout. the testmaker discards items that are too 
hard, too easy, or defective and arranges the rest in order of difficulty. 
From this arrangement he selects items 1. 3, 5. 7. 9, etc. for Final Form A: 
items 2, 4. 8. 10. etc. for Final Form B. They will probably not be ar- 
ranged in order of difficulty in the published forms, because then students 
tend to give upas soon as the items get bard, but if they keep finding easy 
items interspersed with harder ones, they are more likely to finish the test. 
Hence the selected items are often rearranged in the alphabetical order of 
the words to be defined. Let us supptise that there are sixty items in each 
Final Form, and iMie can be reastmably confident that they are equal in 
difficulty. The testmaker alsti tries make the two forms equal in dis- 
criminating ptnver by using a figure called "biserial r" that is routinely 
computed U^r each item. It wtnild take Uu) long for present purposes to ex- 
plain precisely wliat this meatis. l)ut in general it answers the question: to 
what extent did high-scoring students on the total test do better on this 
particular item than Unv-sctiring students? 

riic final step is lo get as tnany teachers as possible in different schools 
to gi\c both Final h'ovms u^ the same ela^^ses on successive days. Then the 
testmaker can ctmipute the ct^rrelatit^n between scores on the two Final 
Forms, since the same students tocik btith. He does not "step up" this et>r- 
rclation b> the Spearman-Bnn\ n t\)rmula because hedties not expect any 
teacher thereafter to give hiMh forms the same students in tine examina- 
tion. The correlation betv^een scores on Ftirms A and B is itself tlie relia- 
bility of either form. This is called "parallel Umw reliability." and it is the 



ERLC 



38 Paul H, Dwiivrii h 



nios! highly ostcciual. especially if the testmaker reports a range of relia- 
biliiies lor yroups ot* class si/e» It clearly contbrms to the detlnition of test 
relial)ility: the amount of agreement between two independent measures 
of the same characteristic* taken at about the same time. 

Since teachers do not have time to apply this procedure to their own 
tests, but still ought to have some easier way to compute their reliability, it 
first occurrcti to someone that, if you have only one form, you can break it 
up into something like parallel forms by getting one score on odd-number- 
ed items and another score on even-numbered items. The correlation be- 
tween seorcs on these random halves is the reliability of the half-test and 
has to be "stepped up" by the Spearman-Brown tbrmula to get the relia- 
bility of the w hole test. This is called * ^Mit half" or ''odd -even" reliability, 
and it is still widely used. It shoukl n^n be used with speeded tests because 
students get the same score — () — on all items that they do not reach, and 
this spuriously increases the correlation between odd-even halves. 

Next. Kuder and Richardson devised a long series of formulas that 
yielded almost the same results as the split-half method, Their Formula 20 
is most often used today by large testing organizations like KTS to deter- 
mine the reliability of their objective tests. 'I'hc only trouble with it is that 
ynu have to know how many students answered each item correctly, and 
unless you have data-pri)cessing equipment, that takes more time than 
teachers can afford. 



RHUABIMTY 

In his liinf^ruphiu l.itrrariu {V.wry maw Hditiiui. p. .Ih). Coleridge pays this 
iribiiie to his t'rierui, the poet and essayist Ri»bert Soiithey: 

"No less punciual in tritles. than steadfast in the perivirmance of highest 
ilulies. he intliels none of those small pains anil diseoniforts v.hieh irregular 
men scalier ahoiii iheni. anil which in ihe aggregate so often become formi- 
dable obsiaoles both to happiness am^ utility: while on the contrary he be- 
stows all the pleasures ami inspires all that ease of niimi on those around 
him or connected with him. which perfect consistency; and (if such a word 
miuht be framed) absolute n'liuhilii\\ equally in small as in great concerns, 
cannot hn inspire and bestow; when this too is softened without being 
weakened bv kitidness and gentleness/' 

According to the Oxford l\ft}^lish Du'tumury, this is the first recorded use 
ot the term r^'UahiUtyWXu). even though it is regularly formed from rrliahlr 
which goes nuich farther back (KShM). Ihe sense in which it is used 
bv Coleridge, where it stands for consistencv and stability, is not too far re« 
nu»\ed trom the s^'nse in which it is applied to test scores. 

I he mind-btiggling sentence in which it appears is typical of Coleridge. 
What he means is. "You can always count on Southey. He's n'liuhlv.'* 



ERIC 



MeaauriHft (jruwth in linf>lish J9 



Their Fornuila 21. funvover. is made to order tor teaehers. It takes only a 
tew tuiiiutes toeompute after you know the mean and standard deviation, 
whieh you ou.nht to eonipute anvway tor the purposes discussed earlier. If 
>ou have forgotten the short -eut formula for the standard deviation, it is 
yiven on paj>e 2h. 

Ik'rv is a sliuhtly siniplit'ied version of the Kuder-Richardson Formula 
21 . whieh yields a eh)se apprt).\imation of the reliability of titijective tests in 
whieh all itetns have etiual weiyht: that is. each correet answer gets one 
point and eaeli inct)rreet or t>niitted item gets 0. It must be applied only to 
raw scores on such tests, not to standard scores, percentiles, or numbers 
corresponding to letter grades. 

Reliability = ONK minus '^^-^^'V^^-i''"^''^ '^^^"^^ "^'""s th e MEAN) 

Number of items times standard deviation squared 

If you prefer symbols to formulas written out in words, it is: 

Mtn-M) . , 
r^^ = 1 m which 

ns2 

Tjjjj = reliability 

M = mf:an 

n = number of items (NOT number of students) 

s- = standard deviation squared 

Suppose that, on the vocabulary test of sixty items, the MEAN is 40 and the 
^ standard deviation 10. This becomes: 

^ _ , 40(60-40) 
'^xx — ' r 
60 X 102 

_ J 40x20 
60 x 100 

_ j_ 800 
6.000 

= 1-.133 

= .867 (rounded to .87) 

I he most ctininion mistake in applying this formula to your tnvn tests is 
to get so involved in manipulatinj^ the rather large numbers in the fraction 
that vim forget to subtract the rcsuhing decimal from ONI-. What should 
alert you to the mistake is that the fraction usually turns t)ut to be a rela- 
ti\el> small number, like the .\X'\ above. If that were the reliability, it 
would be terrible, but it is not: it is the error, the random variatitm. the 
rNreliability. The reliability is ONb! minus this deeimai. whieh is .H7. 

Although this reliability is quite high for an objective test that most stu- 
dents will tuiish in 20 minutes or less, one must m)t expect ()ther objective 



40 Fuul H Uu derU h 



sections I hat arc ottcii included in Knglish language arts examinations— 
reading comprehension, listening comprehension^ and ability to detect er- 
rors in sentences — to do as well. Vocabulary is nearly always the most re- 
liable objective section ot any verbal test tor two reason.s; the items go so 
fast ihai one can get in a large number in minimal time* and they yield a 
large standard deviation* since people vary a great deal in the range and 
precisiiMi ot'their knowledge ot* words, One has to allow about (me minute 
per item tor reading and listening comprehension, and since there is rarely 
more than .M) minutes available for these tests* their reliability is likely to 
be in the sixties. Dsage items (ability to detect errors in sentences) take 
about halt' a minute apiece; hence you can include forty in a 20-minuic 
test, anti its rcliabiliiy is likely to be about .70. 

The average reliability of these four objective tests — reading and listen- 
inu ctmiprehension, vocabulary and usage — may well be no higher than 
.1). Is that the reliability of the total objective part of the examination? By 
no means. The reliability of the total has to answer the question: if you 
gave comparable forms of these four tests to the same students tomorrow, 
how close wiuild their tiUal scares come to the tatal scurvs they got today? 
I hcrcfbrc ytui must add together their raw- scores on these four tests, tlnd 
♦he mean and standard deviation of these total scores, and then apply the 
Kuder-Richardson Forni'ila 21. Do not try to give extra weight to items 
that take longer and seeu\ more important, or you can't use Formula 21. 
Anywas. the total mimber t^f correct answers in all four tests is an ade- 
quate basis for et)mputing the reliability of the objective part of the exam- 
ination. In the time usually available for the objective sections one can get 
in at least IW) items* and it is virtually impossible to attain a reliability 
lower than .^K) for tiMal sct)rcs on IhO objective items that areas highly cor- 
related as these are likely \o be. 

In the last sectitm (page .V^) we concluded regretfully that the overall rc- 
liabiliiy o\ the essay part tif tlie examination was unlikely to exceed JO, 
but it wtnild be offset by ihe higher reliability of the objective part. Let us 
assume now that ih" reliability t^f these two parts turned out to be .70 and 
.MO respectively. How {\o we combine these to get the reliability of tinal 
g!\uics on the cxaniinaium as a whole* assuming that the essay and objec- 
ti\e sectiotis are to have equal weight? The chief statistician at HTS de- 
\isod a formula tor it. but it turned out that a simple arithmetical average 
ofthc two reliabilities "stepped up'' by ihe Spearman-Brown formula gave 
the ^ame result. I he average o\ JO and is .80. 'f'wice this Correlation 
tlixiticti bv tme plus this ciMTclatitni is l.W) twer l.HO* or Uv IH. or H 9, or 
..W. I bis is a cimservativc estimate t)f the reliability t)f tinal grades on the 
examination as a wliole, and it is eminently satisfactory. 

It is almost a pity that teachers usually insist on adding a course grade: 
their jicrsonal estimate of the atmnint and quality t^f work done during the 
course. There is ntiuay kmn\n \o mathematics m estimate the reliability o\ 



MeasuriHg Growth m English 41 

fhuL Still, in the imprecise field of education a mathematically rigorous 
sNstcm ol nieasurenicnt may need a bit of looseness somewhere to make it 
comfortable to live with, and the course grade determined by each teaelier 
may di> just that. 



Design for an Examination 
in English Language Aits 

In the sectiini on the reliability of* essay grades, especially on pages 
34-35. it was shown that, if you want to give real weight to the essays, you 
must secure twt> test essays from each student with some separatiot^ in 
time as well as in topic, and the least possible separation in time is to have 
one test essay written in the morning and the other in the afternoon ol' the 
same day. Many colleges have found that the only feasible w ay to get two 
essays p^T day in fields that use essay examinations is to schedule an exam- 
ination peritui i)f one week at the end of each quarter or semester, In this 
week, one day is assigned to each field in which the examination requires a 
good deal of w riting and half a day to fields in which the examination con- 
sists of objective or short-answer questions, like science and mathematics. 
Monday may be reserved for all ct>urses in Hnglish language and litera- 
ture, fuesday for all courses in h>reign languages and literature. Wednes- 
day fi>r history and stKMal science. Thursday for mathematics and natural 
sciences. Friday ft)r the tine and practical arts, and Saturday for voca- 
tional courses. There is usually a week following the examination period in 
which nu>st students are on vacation, but make-up examinations are 
scheduled in the same t)rder for the few who wen* absent or who want to 
imprtne their grade. Only by special permission are students allowed to 
lake twt> ct)urses in fields tested t>n the same day or half-day. They must 
take the make-up examination in one of these fields with the understand- 
ing that they waive the privilege of repeating that examination until the 
next time it is t>ffcred— at the end of the next quarter or semester. 

A ct>mprehensive examination in Fnglish language arts might be ar- 
ranged as folUms: 

A. First objective section: maximum time, tnie hour 
Reading ctmiprehensitm. .^0 items. .^0 minutes 
\t)cabulary. 60 items. 20 minutes 

O 

ERIC 




42 Paul li, Diederkh 



B. First essay: maximum time, two hours (but most students finish 
ami leave tb" examination room in 90 minutes or less) 

LUNCH 

C\ Second objective section: maximum time* one hour 
Listening comprehension* 30 items* 30 minutes 
F-iiglish usage* 40 items* 20 minutes 

I). Second essav: maximum time* two hours (but again most students 
tuiish and leave in ^) minutes or less) 

These time estimates are based on the time students usually take to fin- 
ish sudi items or tasks when there is no pressure of time. The essays are 
scheduled at the end of the morning and afternoon sessions so that stu- 
dents may leave as soon as they have finished. They vary far more in the 
time they are able or willing to spend on their papers than in the time they 
take in answering objective items, Able and conscientious students usually 
spend more time than the average* especially in planning and revision, and 
we do not want to cut them off before thcv have completed the task to their 
satisfaction. There are always a few compulsive students, however, who 
keep liacking away at their papers long after everyone else has left the 
room. No matter how much time one allows, they always want more. At 
tlie end of the scheduled time, one has to take their papers away as gently 
as possible and shoo them out. 

Combining Scores on Comprehensive Examinations 

If the foregoing outline of a comprehensive examination is accepted as a 
\\t>rkablc model* we next face the problem of combining four grades on 
the two test essays and four numerical scores on the objective sections. Of 
course tliere will be variations in this outline to suit different courses of 
study, but the problem of combining grades and scores will remain. 

As a first approach, let us assume that the composition staff has been 
using let icr grades with plus and minus signs, and that they have agreed to 
aim at the distribution of grades predicted on p-ige 30; H, 15% D* 40% 
C*. 25''i» B* and 15*^i> A. To combine the four essay grades for each student, 
ue have to translate the letter grades into numbers as follows: 

K- K K+ 1) 0+ C- C (:+ B B+ A- A A+ 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Adding tour of these numbers fv)r each student to get his total score on 
the essay part of the examination presents no problems, but then we have 
li» cimibinc the^c totals with much larger numbers representing scores on 
the four objective tests, and we want to give the essay and objective sec- 
lions equal weight. 



Measuring Growth m t'nglish 43 



*l his pri>blcni is usually ijinorcd in elementary textbooks on measure- 
mcnt. so there is no standard pri)eedure, but the simplest and most satis- 
factory method 1 have found is io turn the raw scores on each objective test 
into letter tirades in accordance with the predictions already applied to the 
essa>s. *rhus» the U)p 15 percent of* scores on each test get A's; the next 25 
percent get the next 40 percent get C\; the next 15 percent get D's; 
and the lowest 5 percent get H's. We may have to vary these proportions a 
bit when* for example, ten students get the same score at the lower boun- 
darv I'ov B» but the predicted 25 percent will take in just three more stu- 
dents at this point. Wc can't give three of ihenu chosen at random, a B 
and the other seven a C if they all made the same score* In such cases, I go 
for whichever grade makes the smaller difference in the prediction. Since 
only three t>f the ten came within the B range, I would give all ten some 
variety t>fC(pn>bablyC-+-). Ifsevenof theten had come within the B range, 
1 \\ou\i\ have given all ten some variety of B (probably B-). After assign- 
ing letter grades \o all four objective tests in this fashion, the staff trans- 
lates them the numbers corresonding to each grade in the table above 
that was useti h>r the essay grades* incidentally, this gives each objective 
test equal weight, even though the vocabulary test has twice as many items 
as the reading comprehension and listening comprehension tests. 

In carrying i>ut this procedure, someone is sure to object, ''Those were 
the proportions for the various grades that we predicted for the essays. 
How can wc apply them to the objective scores as well?" 

This is natural* since at that poiui we were trying to set guidelines for 
the readers that would make the distribution of essay grades conform to 
reasonable expectatit>ns. But if you liu>k carefully at what I said on page 
.^0. you will see that I asked each teacher to predict how many students in 
each of his or her classes would make each grade tm thv iwamifianon. I 
said so repeatedly. Since they knew that the examination would include, 
objective tests, it is reastmable to apply the same predictions to scores on 
these tests. 

Now we have tight numbers for each student corresponding to letter 
grades on the essays and t>bjeetive tests. We know that none of these com- 
ptmenls is highly reliable (except the vocabulary score) and some are not 
\ery hiuhly correlateti with tnhers. The effect of averaging eight numbers 
of this sort is to slun e everybody closer \o the mean than we intended. If we 
just a(hl the eight numbers .md then tiivide by M to get each student's final 
(aserage) grade on the examination, it is virtually impossible for anyone to 
get a tlnal average higher than 1 K which means B* or lower than 5* which 
means I). W'epredietetl that 15 percent would make A\* 25 percent B\* and 
so on. but even though we made the grades on each eoi,»ponent et>me out 
that wav \ov me gnnip as a whole* the mn.!^^ers each :(!ent gets depend 
so lart:ei\ on ehanee that, if we lake straight averages* no one will get an A: 



ERIC 



44 Paul B, Divderii h 



about 15 percent will yet B s; about 75 percent Cs; about U) percent D's; 
and no one will tail. 

Any mathematician would have foreseen this resuU. but English teach- 
ers aiv not mathematicians, and their first reaction is always shock, in- 
credulity, and dismay. Someone must have made a mistake in adding oi 
avei*agint»I No: the figures have ail been checked, and they are accurate. 
Then some say that since these numbers represent our own judgments, we 
are morally obliged to abide by them. Others say no; the averages make no 
sense; atid we must re-examine the papers and raise or lower enough 
grades to make final grades come out in the intended proportions* 

Neither taction is right, and the solution is simpler than either one 
imagined. Da not awnijac ihosv ci^hi numbers. Simply add theni and 
make a distribution ot'toial scores. Draw a line under the top 15 percent of 
these \o\i\\s. Any student above that line gets an A: the next 25 percent get 
B\: the next 40 percent get C*s; and so on. This is the first point at which 
those prediciitMis make any real difference, and here above all we should 
abide by tliem. 

The essay grades might just as well have been standard scores like those 
discussed earlier, based on the normal cut^e. with a mean of 30 and a 
standard deviation of 10. and with no attention at all to the presumed 
superiority t)f this group. Hach reader could then divide the papers he re- 
ceived intt) top quarter, middle half, and bottom quarter; then pick out a 
fifth of the high papers as the very high, and a fifth of the low papers as the 
very low. The objective scores could be translated into standard scores in 
the same manner, or by actual calculation of standard scores. Once 
teachers get used to it. this is far easier than observing the predicted pro- 
pmtions tor the varit)us grades at every point. The result would be that 
each student utnild have eight numbers after his name representing stan- 
dard setMcs. different from and larger than the numbers representing let- 
ter grades. But if one simply added the eight standard scores, made a dis- 
tribution of the totals, and drew a line under the top 15 percent for a final 
grade ot A. the chances are slight that any student who received an A from 
the first set of numbers would not also receive an A from the second. All 
we need to be sure about is that all eight numbers are on a common scale: 
either the standard score scale or the letter grade scale. It is only when we 
get the totals on either scale and make a distribution of these totals that we 
really need to think about our predictions— but then we should stick to 
them like ghie. 

Suj^pnse the staff insists on giving some of the eight scores more weight 
than others. Suppose they decided that reading comprehension was the 
most important <if the objective scores and shcnild have a weight of 1.5; 
and that the score on error-detection was least important and should have 
a weight t>f .H. \ erv well: a clerk simply multiplies the numbers represent- 
in.ii cither siamlard scores or letter grades on reading comprehension by 



Measuring Growth hi English 45 



1.5; then the mnnbers representing error-detection by .8. He adds the 
eiyht scores, some weighted in this fashion, for each student, draws a line 
under the top 15 percent tor a grade of A, under the next 25 percent tor a 
urade ot* B. and st) on. At the end, it would be advisable to correlate the 
weighted with the unweighted totals. Over the years it has been tbund that 
weighting rarely makes any serious difference; students come out in nearly 
the same rank order regardless of weighting. Hence my advice would be to 
!ui\v- all parts of the examination equal weiglit unless, for pedagogical rea- 
sons, you want to emphasize the importance of some part of the course by 
saying that it will get extra weight in the examination. It will probably 
make little if any difference in students' grades, but it may get them to 
work harder at something that they might otherwise neglect. 

.After the examination grade has been determined in the manner just ex- 
plained, there is still the problem of combining the examination grade 
with the course grade, determined by each teacher. Here again simnle 
averaging will push everybodv closer to the mean than the staff intended, 
and again the remedy is the same; add the two numbers for each student 
corresponding to his examination grade and course grade. Make a distri- 
bution of these totals and award tlnal grades of A to the top 15 percent. B 
to the next 25 percent. C to the next 40 percent, and so on. It is desirable to 
report all three; examination grade, course grade, and tlnal grade. If any 
student or parent objects that the tlnal grade is not precisely the average of 
the otiier two, explain that these are "adjusted averages." 



A Note on the Significance of Differences 

Since so few Hnglish teachers conduct controlled experiments, and 
those who i\o have statistical help, I shall not devote much attention to the 
significance of differences between the averages of groups taught w'ith dif- 
ferent materials ov methtnis. But since books and articles on the teaching 
of English often state that the difference between the results achieved by 
Method A and Methtui B was not significant, or was significant at the .05, 
y \ or .(K)! level, I want you to have some notion of what it means. 

t he basic idea is that there is a good deal of chance (random) variation 
in all educational measures, and the amount of variation you would find in 
two ou\ oi three repetitions of the same measurement t)peration is called 
the *\iandard ern>r** of that measure. This has nothing to do with mis- 
takes, with bias, or with exiernal conditions (such as an infernally hot 
day); it nu^st comnuwily refers to chance variations from one sample of 
tasks or performance io ant^tlier. hor example. I said on page 28 that the 
standard error oi essay grades was about 5 points on the standard score 
scale that 1 pn^posed (w ith a mean of M) and a standard deviation of 10). 



ERLC 



4h Paul tt. Dii Jerk h 



That is, it you had the same essa> yradcd repeatedly by different compe- 
tent readers and kept avcrayinj} the si ades until you were certain what the 
true Krade was, you would tlnd that alnuit two-thirds of the grades leading 
to this final average lay within one standard error points) of the true 
grade, and 45 percent of them lay within two standard errors (10 points), 
I shall say no more about the standard error of individual scores because 
they are so large that 1 tlnd it the best policy to disregard them. But the 
standard error of ilie average of large groups—more than HX) students 
--generally used in educational experiments is much smaller: it is the 
standard deviation divided by the square root of the number of students, 1 
mentioned that our hypothetical vocabulary test of sixty items (page 39) 
might have a standard deviation as large as 10, If it were given to UK) stu- 
dents, you would divide the standard deviation 1 10) by tlie square root of 100 
(10), and so the standard error of the average of this group would be just 1 
raw -score point. 

1 also said that the average score (mean) of my illustrative group was 40, 
Suppose that another group, treated in a different way, made an average 
score of 45 on this same test and also had a standard deviation of 10: 
hence a staiulard error of 1 point. Is that difference of 5 points between the 
two averages a true (signiticant) Uifference, or is it within the range of 
chance variation that one should expect in two administrations of the 
same test? 

To tlnd out. you have to compute the standard error of the ^/zV/ivvz/tc, 
You square the standard error of the first average (1 x I rr l), square the 
standard eriorofihe second averaged x I - I), add the two squares (1 + 
1^ - 2). and take the square root of the sum (2), which is 1.41, as you can 
find in aii> lablc ot squares and square roots. Then the signitkancc 
(rcalii\)of the difference is judged against four standards: 

1. If the difference (5 points) is less than twice as large as the standard 
error of that difference (1.41 \ 2 ^ 2,82). it is not signiticant. This 
docs not assert that it is. but that it could be, a ctiance variation. But 
since 5 is much larger than 2.82. it passes this tlrst test, 

2. If the diftercnce is between 2 and 2.6 times as large as its standard 
error, it is signiticant at the .05 level, meaning that there an? less than 
5 chances in a hundred that a difference this large would be found if 
there were no true difference. But 2.b times 1.41 is .lh7. and the dif- 
ference of 5 is larger than this, so we can go on to the next level of sig- 
nitlcancc. 

.\ U the difference is between 2.6 and times as large as its standard 
error, it is significant at the .01 level, meaning that there is less than 
one chance in a hundred that it was a fluke. But x 1.41 is 4.2.^, and 
5 is larger than this, sc^ we go on to the next level. 



ERIC 



4. iniif sliU'ciciuv is nuriv ihan } times as lai'Mc as its siaiulard error, il 
is sij^iiiticam at tlu- .(K)| lost!!, inoa»in{» that there is less than one 
chance in a thousamt that it was a tluke. As we have just seen ^ s 1.41 
4.2.V ami 5 is larger than this. si> it Is sijinitWaiil at the .001 level. 

There are many diHerent types of standard errors: ol' correlations, pro- 
portions, rejjrcssioi's. etc.. each computed in a dllt'ercni way and yieldiiii' 
results ol" diltereni orders of magnitude. There are also many ditferent 
ways of computinji the siynincanec of differences between experimental 
groups: chi-square. analysis of variance and covariance. regression analy- 
sis, otc. l)nce you jiet into this statistical ma/e. you will never yet out wiiit- 
»uit help. Hut for most of the articles you will read that refer io the signifi- 
cance of a differe!ico between swi> jjroups. the basic idea of "sijinitleance" 
is conveyed by the classical pr«)cedure that 1 have just cpiaiiied: if the dif- 
ference hotween the two averayes 2, 2.(>. or times as large as its own 
vtandard error, it is sijjiiitlcant at .« A)\ .01 . or .(K)| level respectively, re- 
ferring to the chances it. a humlred or a thousand that a difference this 
large wouKi be founti if there were no true difference. 

One final point: "signincant" does not necessarily mean "important"; 
it means onlv "non-chance." In statewide testing programs in which sever- 
al hundred thousand students are involved, one group of lO.(XX) might be 
compared with another group of 1().(KX). To get the standard error of each 
average, you wcnild have to diviiic the standard deviati«m by the square 
root of this number, which is KM). That woulil make the standard error so 
small that a difference of a tenth of a point might be significant, in the 
sense that it ct)idd nut be attributed to chance, but it w«uild have no eiluea- 
tional or practical importance. Perhaps I sh(ndd add that the .(X)l level 
does not mean that the ditferenct- was 10 times as large as at the .01 level: 
it only means that you arc ten times as sure that there was .vomc differenee. 



ERIC 



Initiating Staff Grading 
of Test Essays 

It is no easy matter to introduce statT grading of unidentified test essays 
on the same topic in a statT that has four or more teachers of English, 
Teachers may be so sensitive to possible criticism of their results that they 
will not let anyone else even see the essays written by their students, let 
alone grade them. You may reassure them that no one will know which 
papers were written by their students because they will be identified only 
by numbers chosen at random by each student. Then they will want to 
know how anyone can possibly grade a paper fairly, not knowing the stu- 
dent. You may reply that in such examinations we are grading the writing, 
not the student, and that a final grade of D for one student may represent 
a triumph, while a final grade of B for another may represent a shattering 
disappointment. If we profess to be teaching composition, we ought to be 
able to tell which papers are better than others, regardless of who wrote 
them. Still, the argument goes on. 

I see little hope of winning over such people by argument or persuasion. 
One has to introduce a series of experiences that will open their eyes to the 
extent of disagreement in the statf on the worth of selected papers that 
they all grade independently. I used to do this by getting one paper per 
month, each time from a different teacher, making typed copies with all 
identification, comments, corrections, and grades removed, and having 
each teacher grade it. comment on it, and return it to me at least one day 
before our next staff meeting. In that meeting I would write on the black- 
board what grades the paper had received, and at the start I was pretty 
sure to get four or five different grades. The teachers were dismayed, hut I 
tried not to be. I explained that such diffea^nces in grading standards al- 
u ays came to light w henever a staff began to study the reliability of its es- 
say grades, and the only way to improve was to discuss our differences, 
examine the reasons behind them, and gradually develop standards that 

48 




Measuring Growth in English 49 

would bring our grades closer together. I said it would be foolish to expect 
anything like perfect agreement in judgments of writing ability; all we 
could hope for would be the amount of agreement represented by a corre- 
lation of about .50 between grades assigned independently to each set of 
test essays by pairs of readers. Since that is the usual correlation between 
height ami weight among adults of the same sex. it would still leave plenty 
of room for legitimate differences of opinion. But we were starting with a 
correlation of about .30 in our grades on this paper, and that was al- 
togetlier too low to be fair to students. 

I would then call upon some respected staff member to explain why he 
gave this paper an A. Next I would ask a friend of his to explain why he 
gave it a D or an E. Other teachers would express agreement or disagree- 
ment with these explanations and tell why they gave the paper some other 
grade. Thus we would move toward an elucidation of the grading problem 
presented by this paper and what policy we should adopt if we found such 
a paper in an examination. These discussions, which were amicable but 
spirited and often witty, proved to be more interesting than what we had 
previously done in staff meetings, and they gradually moved the staff 
toward acceptance of the idea that maybe more than one point of view 
should be represented in grading such imponant essays as those written in 
examinations. 

We next tried out this idea in the least threatening case: each teacher 
chose one other teacher with whom he was willing to exchange papers on a 
topic that both had assigned to at least one class at the same level. Each 
graded the papers of both classes independently, and without writing any- 
thing t>n the papers. Then they compared their grades and resolved differ- 
ences t)f more than t)ne full grade by discussion. We learned the easy way to 
compute the correlation between the two sets of grades (before resolution of 



John Stalnaker. long president t)f the Merit Scholarship Foundation, re- 
calls this incident from his early days as Examiner in English at the Univer- 
sity of Chicago. 

In tine oihis experiments he had a few hundred papers to grade. He called 
in huir t^fhis most experienced readers and told them. want you to grade 
these papers but not on your regular scale of A to F. f know that you all have 
tlifferent itleas abtmt what those letters mean. Just son these papers into five 
piles in tmler of merit. Then mark the highest pile 4. the next pile J. and so 
on down to ().** 

rhey agreed to so. but about a week later they came to his office and 
said. "We're stirry. Mm. but we could not do what you w*anted. It turned 
nut that there weren't any *'4" papers. But we did the best we could. We 
sortetl then into five piles, but we had to mark them 3. 2. K 0. and 00." 



ERLC 



50 Paul B. Diederich 



ditferenccs) that was explained on page 33. These figures gradually con- 
vinced us that a single essay, graded independently by two readers, was not 
ctiough to yield the reliability that we wanted in our examinations, so we 
gradually developed the type of examination outlined on pages 41-42, in 
which morning essays were graded by one pair of readers and afternoon 
essays by another pair. Later, as the staff gained experience with this 
method of grading, they decided that it would be a good idea to expose 
themselves to a wider range of viewpoints than that of their best friend in 
the department, so they let the department head assign sets of papers to 
pairs of readers that were either chosen at random or systematically 
rotated. 

This is a shortened and simplified account of the development of the 
staff grading procedures I have recommended— with all the mistakes, set- 
backs, and wasted motion left out. Some of these procedures represent 
changes from those I suggested in earlier publications; more recent studies 
have changed my mind. Those that you adopt must be suited to your 
course of study, your student population, and the convictions and prefer- 
ences of your staff. But one requirement is almost universal. At some point 
someone with authority— usually the principal or dean— must tell the staff 
to stop arguing and try something— no matter what. Without that push, 
nothing will happen. 



J 



Appendices 



ERIC 



Descriptions of Papers Rated 
Higli, Middle, and Low on Eight Qualities 

Some readers may be disappointed that the procedures recommended 
tor ascertaining and improving the reliability of essay grades all involved 
the cooperation of at least two teachers. What they probably hoped to 
learn was some way of rating papers that would improve the reliability of 
their own grades so that they could have greater confidence in their fair- 
ness and accuracy and could explain to students exactly why their grade 
was high or low. In other words, what they wanted was a list of things to 
look for in student compositions and how many points to give for this or 
take off for that, 

A collection of readings offering suggestions of this sort was published 
by the National Council i)f Teachers of English, 111! Kenyon Road, 
Urbana. Illinois 61801, in 1%5: A Guide for Evaluating Student Compo- 
sitiou. edited by Sister M. Judine, IHM, It is a paperbound volume of 162 
pages, and sells for $2,75, 

Although these papers contain much practical wisdom, I have never had 
much confidence in any scheme for rating papers that does not involve 
comparison with independent ratings of another person and discussion of 
papers i^n which there is a substantial difference of opinion, I have never 
seen any solid evidence in print that any of these schemes improves relia- 
bility. 

If you want to use some sort of checklist to improve the consistency of 
your ratings, the only help I can ofier is an example of the way in which 
guidelines for rating papers might be developed. After our factor analysis 
of judgments of writing ability, described on pages 5-10 of this booklet, we 
proceeded ti^ a study of writing improvement in twelve school districts in 
the state of New York, All students in grades 9 and 10 wh(^ were involved 
in this study wrote one test paper per month on a topic set by us — the same 
topic for bi^th grades. As indicated on page 11. these test essays were 




ERIC 



S4 PaulB. Diederich 



written on paper that yielded three sharp, elean eopies. two of which were 
sent buck to different schools for rating on the following type of rating slip. 

Topic___Jleader Paper_ . 



Low Middle High 

Ideas 2 4 6 8 10 

Organization 2 4 6 8 10 

Wording 1 2 3 4 5 

Flavor 1 2 3 4 5 

Usage 1 2 3 4 5 

Punctuation 1 2 3 4 5 

Spelling 1 2 3 4 5 

Handwriting 1 2 3 4 5 

Sum 



Teachers encircled one number after the name of each quality to indi- 
cate their rating of the paper on that quality. At first the numbers all ran 
from 1 to 5. but since their courses concentrated on ideas and organiza- 
tion, they persuaded us to give double weight to those ratings by doubling 
the numbers representing each scale position. This weighting had no basis 
in research, hut it seemed reasonable to give extra credit for the qualities 
these teachers wished to emphasi/e. 

These eight qualities are short forms of the names of the five factors in 
judgments of writing ability revealed by our factor analysis, except that the 
mechanics factor is broken up into its logically distinguishable compo- 
nents—usage, punctuation, and spelling — and we added RemondinoN 
factor (see page 9), here called "handwriting." At the right are spaces for 
subtotals of ratings on the first four factors, which we called "general 
merit," and on the last foun which we called "mechanics." and then a 
space for the sum of these two. the total rating. Note that, if a student gets 
the lowest possible rating on everything, his total will be 10: if all his rat- 
ings are in column 2. his total will be 20: and similar totals for the other 
three columns are .W. 40, and 50. These coincide with the standard scores 
of 10, 2(\ .M), 40, and 50 corresponding to letter grades of F. D. C B. and 
\ as explained on pages 27-28. Thus they wore compatible with and led 
into the later use of standard scores: meanwhile they developed a clear 
idea of what the standard scores meant in terms of factors that make a dif- 
ference in the grades of skilled readers. 

Although these factors are represented on the rating slip only by short 
forms of their names, we developed an initial understanding of what they 
meant in all-day Saturday workshops that these teavhets were paid to at- 
tend. Wo also gave them practice in rating sample sets of papers that had 
previously been rated on those eignt qualities by export readers. Wc kept 



I 



Measufing Growth in English 55 

rating sets ot tliose papm and discussing ditterenccs of opinion until a 
reasnnahlc consensus was readied. 

Aiter tile test papers had been rated in tiiis tasiiion for one seiiool year, 
iieads of these departments met in a week-long workshop during the sum- 
mer. Haeh brough' a small sample of test papers on each topic that had 
been ratai high (top quarter), middle (middle haltt. or low (bottom quar- 
ter), and tliat he or she regarded as typical papers at these levels of merit. 
We made photocopies of these papers and studied them together until we 
were able to agree upon brief descriptions of their salient characteristics. 
These descriptions were used throughout the following year as a guide in 
rating the monthly test papers, and particularly in training new teachers to 
rate papers on these qualities. At the end. the department heads met again 
and revised the descriptions, ehietly by cutting out parts that had been 
miire confusing than helpful. The revised descriptions are reproduced in 
the following pages. 

By the end of this study, we had come to look upon these guidelines as a 
training device that teachers may well use for a year or two to develop a 
common set of standards and a systematic way of thinking about the qual- 
ities that should enter into their judgment of a paper. After two years (at 
most) they move easily and naturally into the use of standard scores as a 
quicker and easier way to indicate their judgment of the general merit of a 
paper. We call this "rating on general impression." but it is no longer a 
blur: it is a quick summing up of characteristics that determine whether a 
paper is high, middle, or low in general merit. The teachers also have a 
common vocabulary for discussing the merits and defects of papers on 
which their grades disagree. They quickly recogni/e their agreement on 
perhaps six or seven of these eight qualities and "zero in" on the one or 
two that acct)umed for the discrepancy in their grades. 

I. CJHNHKAL MHRIT 

1. Ideas 

Hif>fi. The stuiient has given some thought to the topic and writes what he 
really thinks. He discusses each main point long enough to show clearly 
what he means. He supports each main point with arguments, examples. 
or details: he gives the reader some reason for believing it. His points are 
clearly relateii \o the topic and to the main idea or impression he is trying 
to ctMuey. No necessary points are (uerlooked and there is no padding. 
Mnhiii: The paper gives the impression that the student does not really 
believe what he is writing or does not fully understand what it means. He 
tries to guess what the teacher wants and writes what he thinks will get by. 
He does not explain his points very clearly or make them come alive to the 
reader. He writes what he thinks will sound good, not what he believes or 
knows. 



ERIC 



Sh Paul B. Oivderkh 



l ow. It is cither hard to tell what points the studetU is trying to make or 
else they are so silly that, if he had only stopped to think, he would have 
realized that they made no sense. He is only trying to get something down 
on paper. He does not explain his points; he only asserts them and then 
g(»es on io something else, or he repeats them in slightly ditierent words, 
ke does not bother to eheek his faets. and much of what he writes is ob- 
viously untrue. No one believes this sort of writing— not even the student 
who wvoXQ it. 

2. Organization 

Hijifh. The papei starts at a good point, has a sense of movement, gets 
simiewhere. and then stops. The paper has an underlying plan that the 
reader ean follow; he is never in doubt as to where he is or where he is 
going. Sometimes there is a little twist near the end that makes the paper 
eonie out in a way that the reader does not expect, but it seems quite logi- 
cal. Main points are treated at greatest length or with greatest emphasis, 
others in proportion to their importance. 

Middle. Ihc organization of this paper is standard and conventional. 
There is usually a one-paragraph introduction, three main points each 
treated in one paragraph, and a conclusion that often seems tacked on or 
fmeed. Some trivial points are treated in greater detail than important 
points, and there is usually some dead wood that might better be cut out. 
low. This paper starts anywhere and never gets anywhere. The main 
points are not clearly separated from one another, and they come in a ran- 
dom i^rder— as though the student had not given any thought to what he 
intended to say before he started to write. The paper seems to start in one 
direction, then another, then another, until the reader is lost. 

3. Wording 

The writer uses a sprinkling of unconmion words or of familiar 
words in an uncommon setting. He shows an interest in words and in put- 
ling them together in slightly unusual wa^.^. Some of his experiments with 
words may not ([uite come off. but this is such a promising trait in a young 
writer that a few mistakes may be forgiven. For the most part, he uses 
words correctly, but he also uses them with imagination. 
Middle. The writer is addicted to tired old phrases and hackneyed expres- 
sions. If you left a blank in one of his sentences, almost anyone could guess 
what word he would use at that point. He does not stop to think how to say 
something: ho just says it in the same way as everyone else. A writer may 
also get a middle rating on this qualuy if he overdoes his experiments with 
uncommon w eirds: if he always uses a big word when a little word would 
serve his purpose better. 

/ ow. The writer uses words so carelessly and inexactly that he gets far loo 
many wrong. I liese are not intentional experiments with words in which 



Measuring Growth in English 



57 



fiulurc may be torgivcn; they represent groping for words and using them 
without regard to their fitness. A paper written in a ehildish voeabulary 
may also get a low rating on this quality, even if no word is clearly wrong. 

4. Flavor 

Hi^h. Ihe writing sounds like a person, not a committee. The writer 
seems quite sincere and candid, and he writes about something he knows 
often from personal experience. You could not mistake this writing for thJ 
unnng of anyone else. Although the writer may assume different roles in 
different papers, he does not put on airs. He is brave enough to reveal him- 
self just as he is. 

Miililh. 'f he writer usually tries to appear better or wiser than he really is. 
He tends \o write lofty sentiments and broad generalities. He does not put 
in the little homely details that show that he knows what he is talking 
about. His writing tries to sound impressive. Sometimes it is impersonal 
and correct but colorless, without personal feeling or imagination. 
Amf. The writer reveals himself well enough but without meaning to. His 
thoughts and feelings are those of an uneducated person who does not 
realize how bad they sound. His way of expressing himself differs from 
standard Knglish. but it is not his personal style; it is the way uneducated 
people talk in his neighborhood. Sometimes the unconscious revelation is 
so touching that we are tempted to rate it high on flavor, but it deserves a 
high rating only if the effect is intended. 

II MIXHANICS 

5. Usage. Sentence Structure 

Hinh. There are no vulgar ov •'illiterate'' errors in usage by present stan- 
dards (^f informal written Hnglish. and there are very few errors in point* 
that have been discussed in class. The sentence structure is usually correct, 
even in varied and ctimplicated sentence patterns. 
StUlilU\ I here are a few serious ermrs in usage and several in points that 
have been discussed in class but entmgh to obscure meaning. Tlie sen- 
tence structure is usually ct)rrect in familiar sentence patterns but there 
arc oceasitmal errors in ct)mplicated patterns: errors in parallelism, 
suburdinatitin. ctMisistency o\ tenses, reference of pronouns, etc. 
I There are sti many serituis errors in usage and sentence structure 
that the paper is hard understand. 

(1. Punctuation. Capitals. Abbreviations. Numbers 

Hh*h. I here arc \w scritnis vit^latit^ns tif rules that have been taught — ex- 
cepi slips of the pen. NtUe. htnvever. that nuidern edittirs require 
commas after short introduecorv clauses, annind nonrestrictive clauses, or 



RJC 



58 Paul B. Diederich 



between short u^orc inate clauses unless their omission leads to ambiguity 
or makes the sente. ;e hard to read. Contraetions are acceptable--often 
desirable. 

Muhilc. There are several violations of rules that have been taught — as 
many as usually occur in the average paper. Counts of such errors in high, 
middle, and low papers at various ages and socioeconomic levels would be 
desirable in order to establish standard.s. 

Lo}\ . Basic punctuation is omitted or haphazard, resulting in fragments, 
run-on sentences, etc. 

7. Spelling 

Hif^h. Descriptions of spelling levels are most often used in grading test 
papers written in class. Since there is insufficient time to make full use of 
the dictionary, spelling standards should be more lenient than for papers 
written at home. The high paper (at ages 14-16) usually has not more than 
five misspellings, and these occur in words that are hard to spell. The spel- 
ling is consistent; words are not spelled correctly in one sentence and mis- 
spelled in another — unless the misspelling appears to be a slip of the pen. 
If a poor paper has no misspellings, it gets a high rating on spelling, even if 
no difficult words are used. 

MiiiiUv. There are several spelling errors in hard words and a few viola- 
tions of basic spelling rules, but no more than one finds in the average 
paper. Spelling standards differ so sharply from grade to grade and from 
one socioeconomic level to another that each school would do well to make 
a distribution of spelling errors per hundred words (at least for test papers 
written in class) and relate its ratings to this distribution. 
AfMv. 'I here are so many spelling errors that they interfere with compre- 
hension. 

8. Handwriting, Neatness 

Hi^h. The handv riting is clear, attractive, and well spaced, and the rules 
of manuscript form have been obsened. 

MiiUUi\ The handwriting is average in legibility and attractiveness. There 
may be a few violations of rules for manuscript form if there is evidence of 
some care for the appearance of the page. 

Lo\\\ rhe paper is sloppy in appearance and difficult to read. It may be 
excellent in other respects and still get a low rating on this quality. 



Topics for Test Essays 

If you have to set topics for test essays that will be written by the stu- 
dents of several teachers, you should have a way of securing ratings by 
these teachers of quite a long list of topics— preferably those that you or 
they have used and found appropriate for short, impromptu papers that 
can be planned, written, and revised in the time available and under the 
pressure of an examination • Unless these topics are selected from a list 
that teachers have approved, they almost always complain that the stu- 
dents would have written much better had it not been for the awful topic 
you gave them. Either it was too difficult and beyond their experience or it 
was so dull and hackneyed that no one could get interested in it. 

The following topics are typical of those suggested by teachers for test 
essays. Most of them can be handled successfully by students in secondary 
schools (ages 12-17), but those near the end of the list seem more suitable 
for college students. Although I have no objection to your using any of 
these that seem interesting, I hold no brief for this particular list, I assume 
that you will compile a similar list of topics that you and the other teachers 
have found that your students can handle. Often the topics are suggested 
by papers that students have written on topics of their own choice, I make 
copies of such lists and hand them out to teachers at the first staff meeting 
of the year I ask them to put a 2 before the topics they like best, a 1 before 
those that they accept, a 0 before those that they reject, and no mark be- 
fore those about which they have no opinion. At the next meeting I hand 
out a shorter list of acceptable topics that received the highest ratings. It is 
understood that topics for all examinations concerned with writing ability 
will be taken from this list, but I try to keep the topic for any given exam- 
ination a secret until the day of the test. Otherwise it sometimes happens 
that the less secure teachers give their students such broad hints about the 
nature of the topic that some write the essay beforehand, or get a friend to 

59 




ERLC 



60 Paul B. DieJerich 



% 



wrilc it, and coniinit it to memory. Other teachers may assign a topic that 
is almost like the one to be used in the test and then give detailed instruc- 
tii>ns on how to write such a paper. In one examination we found 
thirty-five papers that all started with the same topic sentence. If it is even 
suspected that some teachers are giving their classes more direct prepara- 
tion for the examination than others, students will lose confidence in the 
fairness of the grades. Hence the only safe policy is secrecy. If the teachers 
keep their lists of approved topics, it is easy to pass the word just before 
the morning essay, "Topic 8,*' Then, if there is to be an afternoon essay, 
you wait until after lunch to announce *Topic 12," Since these topics an? 
usually short, each teacher writes the selected topic on his blackboard. But 
if the ti>pic is lengthy, and there is "stimulus materiar' on which students 
are to comment, the examination papers must be duplicated and handed 
out in sealed envelopes on the day of the test. Then it is understood that 
the seal may be broken only in the presence of the students who are ready 
to take the examination. 
Here is the illustrative list of topics suggested by teachers: 

1 , I saw it happen 

2, What 1 learned from experience 

3, What ril be doing ten years from now 

4, If 1 could do it over 

5, On being alone 

b. My day in the palace 
"7, Flight to Planet X 

8. Robbie the Robot 

9, If an ancient Greek came to town 

10. What happened when some machine went berserk 

11. My idea of happiness 

12. What scares me 

13. My own standard of living 

14. Were people happier in days gone by? 

15. Some things do not change 
Ih. I'he trQ>uble with families 

P. Mistakes parents make with children 
IH. Why teenagers rebel 

Are teenagers conservative? 

20. When should teenagers be treated as adults? 

21. There's ni.UiKiy like 

22. Who should go to college? 

2.V My idea of an educated person 

24. What 1 like about life in my country 

25. What I dislike about life in my country 



Measuring Growth in English 61 

2h. My country's contributions to mankind 

27. In what ways are all men equal? 

28. Is peaceful coexistence possible? 

2^. Can a world government prevent war? 
30. What is the spirit of our time? 



CHOOSING A SUBJECT 

1 was privileged to attend the last regular lecture at Har\ard of the great 
teacher of the Bible. KSrsopp Lake. It was the day before the final examina- 
tion, and I think he tried to ease the tension by telling this story. 

•'Gentlemen. I had a wonderful dream last night. 1 dreamed that I was sit- 
ting on a cloud at Judgment Day. watching all the tribes of earth assemble. 
They all came together in a great plain and sat down. 

**Then. out of the circumambient mist, a great hand arose and began 
writing on a celestial blackboard in letters that all the world could read. 

•Mr wrote out the Ten Commandments, and then — in typical examination 
fashion— it added: STUDENTS CHOOSE SIX.** 



ERIC 



Objective Items Based 
on a Central Theme 

ir this short course on gradin}? essays written in examinations is widely 
used, other short courses will be written that will deal with the prepara- 
tion, review, tryout, selection, scoring, and analysis of objective items far 
more extensively than we can do here. It seemed wise, however* to include 
a brief appendix on types of objective items that teachers of English will 
accept, since so many oJ^them have a deep-seated prejudice against atiy 
use of objective tests. The previous discussion may have convinced you 
that short sections of objective items ought to be included in any final 
examination on F^nglish language arts for at least two reasons. First, the 
course is bound to include reading and listening comprehension, vocabu- 
lary, and grammar or usage, all of which can be tested more quickly, easi- 
ly, and reli.vbly by objective items than by written answers. Second, we 
have seen that the highest over-all reliability that American examiners can 
consistently attain in grades on csshvs written in final examinations is 
about .70, and this is too low to be entirely fair to students or to detect ini- 
provements in the course. It is most commonly raised to .icceptable levels 
by scores on the objective sections, which >'eld much higher reliabilities 
per unit of testing time. 

Still, teachers of English tend to regard these objective sections as, at 
best, a disagreeable necessity which can test only the most superficial as- 
pects of proficiency in Knglish. To help you convince your colleagues that 
objective tests need not be stupid, I should like lo show you a test that I 
\vn>le some years ago and used in one of my examinations at the University 
ofdiicago. Its distinctive characteristic is its unity. In almost all objective 
tests, no item has any connection with any other item, but here the whole 
test deals with a single problem of universal concern. The problem is dis- 
cussed in three short passages that present contrasting points of view, and 
students must answer twenty items that test not only comprehension of 
62 




Measuring Growth in English 



63 



each passage but also an understanding of relationships between these 
passages. Next, there is a short but complete paper written by a student 
who was asked to compare the views expressed in these three passages and 
then state his own position on this issue. Note that the twenty items follow- 
ing this paper will deal with larger aspects of writing than mechanical er- 
rors. (A tew examples of discrete items on ability to detect errors in sen- 
tences will be given laten) Finally, there is a writing assignment dealing 
with one important issue that is a part of the general problem discussed in 
the three passages. 

The test as it stands is probably too hard for high school students. In 
fact, it was a bit too hard even for my college students, I chose a hard test 
as an ilhistration so that intelligent and well-prepared teachers of English 
would themselves get interested in it and find it hard to answer some of the 
. questions. I think they will agree that, whatever else it may be, it is not 
superficial. It is intended only as an illustration of a possible format for 
I objective tests of reading and writing that you and your colleagues may 
want to prepare for your own examinations, using easier material and 
simpler types of objective items. I have found it effective as what might be 
regarded as propaganda for some objective sections in tests of English lan- 
guage arts that rely chiefiy on essays. Many fine teachers of English have 
said to mc, "I have never had any use for objective tests, but 1 can't despise 
this one." 



The Reading Test 

fh'n cfiiiNs. Read all three passages before answering the questions that 
follow. 

Passage I 

*I*he nation, with all its so-ealled internal improvements, which are all 
external and superficial, is just an unwieldy and overgrown establishment, 
clutten d with furniture and tripped up by its own traps, ruined by luxury 
and heedless expense, by want t)f calculation and a worthy aim; and the 
only cure for it is in a rigid cronomy, a stern and more than Spartan sim- 
plicity of life and elevation of purpose. It lives too fast. Men think it essen- 
tial that the NatinH have commerce, and talk through a telegraph, and 
ride thirty miles an hour, whether ///(;v do or not: but whether we should 
live like babot)ns or like men is a little uncertain. If we do not get out 
sleepers [large pieces of wt)od to which railroad tracks are nailed], and 
forge rails, and devtUc days and nights to the work, but go to tinkering 



it^ Paw/ B. DieJerich 



upon our lives to improve them, who will build railroads? And If railroads 
arc not built, how shall wc get to heaven in season? But if we stay at home 
and mind our business, who will want railroads? We do not ride on the 
railroad, it rides on us. Did you ever think what those sleepers are that un- 
derlie the railroad? Each one is a man, an Irishman or a Yankee man. The 
rails are laid on ihem, and they are covered with sand, and the cars run 
smoothly over them. They arc sound sleepers, I assure you. And every tew 
years a new lot is laid down and run over; so that, if some have the plea- 
sure of riding on a rail, others have the misfortune to be ridden upon. And 
when they run over a man who is walking in his sleep and wake him up. 
they suddenly stop the cars and make a hue and cry about it, as if this were 
an exception. I am glad to know that it takes a gang of men for every five 
miles to keep the sleepers down and level in their beds, for this is a sign 
that they may sometime get up again. 

Passage 11 

Myself when young did eagerly frequent 
Doctor and Saint, and heard great argument 

About it and about: but evermore 
Came out by the same door where in I went. 

With them the seed of wisdom did I sow. 

And with mine own hand wrought to make it grow; 

And this was all the harvest that I reaped— 
"1 came like water, and like wind I go." 

lnto.;his universe, the why not knowing 
Nor u7a7ic-c, like water willy-nilly flowing: 
And out of it, as wind along the waste, 
I know not whither, willy-nilly blowing. 

Waste not your hour, nor in the vain pursuit 
Of This and That endeavor and dispute; 

Better be jocund with the fruitful grape 
Than sadden after none, or bitter. Fruit. 

The nu)ving finger writes; and, having writ. 
Moves on: nor all your piety nnr wit 

Shall lure it back to cancel half a line. 
Nor all your tears wash out a w(»rd of it. 



Measuring Growth in English 



65 



Passage III 

No man can serve two masters: for either he will hate the one and love 
the other; or else he will hold to the one and despise the other. Ye cannot 
ser\'e God and mammon. 

Theret'ore I say unto you. Take no thought for your life, what ye shall 
eat. or what ye shall drink; nor yet for your body, what ye shall put on. Is 
not the life more than meat, and the body than raiment? Behold the fowls 
of the air; for they sow not, neither do they reap, nor gather into barns; yet 
your heavenly Father fcedeth them. Are ye not much better than they? 

Which of you by taking thought can add one cubit unto liis stature? 

And why take ye thought for raiment? Consider the lilies of the field, 
how they grow: they toil not. neither do they spin: and yet I say unto you 
thai even Solomon in all his glory was not arrayed like one of these. 

Wherefore, if God so clothe the grass of the field, which today is, and 
tomorrow is cast into the oven, shall he not much more clothe you, O ye of 
liiile faith? Therelbre take no thought, saying. What shall we eat? or, 
What shall we drink? or. Wherewithal shall we be clothed? For after all 
these things do the Gentiles seek; for your heavenly Father knoweth that 
ye have need or these things. But seek ye first the kingdom of God and his 
righteousness; and all these things shall be added unto you. 

'lake therefore no thought for the morrow, for the morrow shall take 
thought for the things of itself. Sufficient unto the day is the evil thereof 

Dhvi'tions continuvd. Mark the best answer to each question. Remember 
that nt)sht)rt answer to a question about a literary work can be completely 
correct. The best answers to the following questions need be only a little 
better than the other answers. 

1. Which of the following questions is the central concern of all three pas- 
sages? 

I. Is the pursuit of pleasure a desirable goal in life? 
•I Is hard wt)rk necessary for success in life? 

3. What sht)uld be our chief purpt)se in life? 

4. Is the pursuit of nialerial values contrary to religion? 

2. Which of the folhnving best represents the goal stated in Passage I? 

1. The development of th#* Nation 

2. Siniplieity and elevation t>f pr.rpose 

.V To ride upt>n the railn)ad rather than to be ridden upon 
4. '1\) keep the sleepers down and level in their beds 

3. Which t>f the following stands tor the t)ppt)sile t)f the goal i>f Passage I? 

1. The NatitMi 3. 'I'he sleepers 

2. Spartan simplicity 4. Building railroads 



66 Paul B, Diederich 



4, Which of the t'ollowing best represents the goal stated in Passage 11? 

1. To sow the seeds of wisdom 

2. To come like water and to go like wind 

3. To be jocund with the fruitful grape 

4. To do whatever the moving finger writes 

^ n .^'^^ ^* following stands for the opposite of the goal of Passage 

1. Doctor and Saint 

2. Sowing the seeds of wisdom 

3. Whatever the moving finger writes 

4. Fndeavor and dispute over This and That 

6. Which of the following best represents the goal stated in Passage III? 

1. The kingdom of God and his righteousness 

2. Sufficient unto the day is the evil thereof 

3. Take no thought for your life 

4. Refrain from any sort of labor 

7. Which of the following stands for the opposite of the goal of Passatje 

in? ^ 

L Mammon X Food and clothing 

2, The morrow 4. Hard work of any kind 

H. Which of the following descriptions of man s role in life as conceived 
in these passages is LEAST accurate? 

1. Passage I: Man is a tool-using animal. 

2. Passage 11: Man is a puppet of fate. 

3. F^issage III: Man is a child of God. 

Rvnwmhcn Which interpretation of each passage is LEAST 
accurate? 

^. Which passage expresses concern over the exploitation of workmen in 
the pursuit of material values? 

I ) PaNsage I 2) Passage 11 3) Passage III 4) None of them 

10. Which passage places its chief emphasis on senicc to others? 

I ) Passage I 2) Passage II 3) Passage III 4) None of them 

I L Which passage or passages regard simpluitv as essential to a good 
life? 

1. AlK about equally 3. Passages I and III 

2. None of them 4. Passage II 

12. Which of these views is based on a conviction that there are no 
ar ers, that effort is futile? 

1 ) Passage I 2) Passage li 3) Passage III 4) None of them 



Measuring Growth in English 67 

13. Passages II and III both deny the value of "taking thought." How do 
they differ? 

1. II regards thought as unrewarding; III as a necessary evil. ' 

2. II refers to thought about philosophic issues; III to thought about 
making a living. 

X 11 prefers action to thought; III prefers faith. 

4. 11 refers to thought about fate; III to thoi'.ght about God. 

14. All three passages seem to regard material possessions as unimpor- 
tant. Which statement of their reasons for thinking so is LEAST 
accurate? 

1. Passage I: We should reduce our wants rather than increasing our 
means of satisfying them. 

2. Passage II: It is pleasanter to drink wine. 

3. Passage III: Striving for worldly goods interferes with the service of 
God. 

Rvnwmhvn Which interpretation of each passage is LEAST 
accurate? 

15. In which ways are the ''sleepers'' of Passage I like the "lilies" of Pas- 
sage III? 

1. Both are subjects of parables. 

2. Both illustrate how men should act. 

X Both illustrate what happens to people u ho concentrate on materi- 
al things. 

4. Both illustrate the advantages of simplicity. 

lb. Which of the following pairs of passages are closest together in point 
of view? 

'>landn 2) I and III 3) 11 and III 

P. Which passage or passages emphasi/e the thought of the following 
quotation: 

The world is too much with us: late and soon, 
Getting and spending, we lay waste our powers. 

1) All of them 2) None of them ,1)1 and III 4)11 

18. Which passage agrees with the thought of the following quotation: 

In the fell eluteh of circumstance 
I have not winced nor cried aloud 

Under the bludgeonings of chance 
My head is bloody, but unbowed. 

I ) Passaiic I 2) Passage 11 .1) Passage III 4) None of them 



ERLC 



68 



Paul B. Diedvrkh 



\^). Which passage agivcs with the thought of the tbllowing quotation: 
Nature has placed mankind under the governance of two sovereign 
masters, puiu and pleasure. It is for them alone to point out what we 
ought to do. as well as to determine what we shall do. On the one hand 
the standard of right and wrong, on the other the chain of causes and 
effects, are fastened to their throne. They govern us in all we do, in all 
we say, in all we think: every effort we can make to throw off our sub* 
jcction will serve but to demonstrate and confirm it. 
I) Passage I 2) Passage II 3) Passage III 4) None of them 

20. Which passage agrees with the thought of the following quotation: 
The great cry that arises from our manufacturing cities, louder than 
their furnace blast, is all in very deed for this, — that we manufacture 
everything there except men; we blanch cotton, and strengthen steel, 
atui rcfme sugar, and shape pottery; but to brighten, to strengthen, to 
rdlne, or to form a single living spirit never enters into our estimate of 
advantages. 

1 ) Passage I 2) Passage II 3) Passage III 4) None of them 

The Writing Test 

Dirccfions. This student was asked to summarize and compare the views 
c.xjMvssed i!i these thrce passages; then to state and defend his own posi- 
tion on this issue. His paper is veproduced here exactly as he wrote it ex- 
cept that each sentence is numbered. The questions that follow this paper 
deal with larger aspects of writing than correctness of expression; they call 
tor the judgment of a critic rather than the skill of a proofreader. It would 
be wise to read the paper as a whole before starting to answer the 
questions, but yon ticcd \\o\ watch for errors in usage, punctuation, or 
spelling, since ability to detect such errors is not tested in this part of the 
c.\ani!!iatitMi. 

(!) Ihc three authors regard success in a job as unimportant because 
nuuu in obtaining success use others as stepping stones. (2) Success is see- 
ing the good in others atui living a goiKl life. 

(3) Passage 1 eotisiders any impnuemcnt in mechanical things as 
unnecessary and unsuccessful because thousatuls oi pci^ple are often hurt 
in making the impnwcnicnt. (4) Passage II says learning is important: it 
also sa\s that if ytni'rc going lo do anvthing» i\on{ do st)methi!ig you'll re- 
urci. for \\hat\ chuic can't he umitnic. <5) Passage III stresses the point 
that vou shtnildn't struggle tor material things: food and clothing are 
nothing coniparctl to everlasting life. H^) All the authors agree that in sue- 
cess there is happitiess. and tiicrc is no happiness in gains made crookedly. 



Measuring Growth in Engtish 



(7) 1 believe success in work can't be the most important element in life 
but is very important. <8) Being successful in business doesn't necessarily 
mean that you're leading a good lite. (9) Many successful people have 
reached their goal bv robbing and cheating others. (10) Success in business 
often leads to conceit, and many successful people can't see the beauty in 
life for thinking only of themselves. 

(11) Success in business is important in that it proves you can accom- 
plish something. (12) It is a good thing if you reach your goal honestly and 
get happiness out of yi)ur success. (13) Many successful people aren't 
happy. (14) The real success in life is happiness and making others happy. 
(15) Many people are so busy rushing toward their goal that they haven't 
time to be happy, (16) I believe success in business is important if you 
KUn\\ let it t)bstruct your vision so that you can't see good in people, and it 
takes up all your time. 



Questions on This Paper 

1. In items 1-9 assume that the student's purpose is to show that success 
hi \\ f>rk is important, provided that — and he mentions all of the fol- 
lowing but one. Which one does he leave out? 

1. Pnwided that it is honestly attained 

2. F*n>vided that it brings happiness and leaves time for other forms of 
happiness 

3. Proviileii that it makes a constructive contribution to the common 
welfare 

4. Pnnided that it dt)es not inflate the ego and prevent seeing good in 
others 

2. In the light of this purpt>se. his review of the passages is 

1. adequate. \ot he answers their t>bjections to regarding success in 
work as impt>rtant. 

2. adequate, for he points mx that their only fundanietital objection is 
to liisiioncst success in wt)rk. 

.V inadequate, {'or he includes only what is relevant to his purpose and 
leaves out many t>ther \w\ms that could be made. 

4. itiadequate. iov he neither rect>gni/es nor refutes impt>rtant objec- 
x\o\\s lo his pt>sition that can be found in these passages, 

.V In the light t)f this purpt>se, the t)peni!ig sentence 

1. starts at a gtuul pt>int in reviewing the passages by showing their 
onl\ seritnis t>l>iection to his own position. 

2. starts at a muxl point but immediately falls into a misinterpreta- 
tion. 

O 

ERIC 



70 Paul 0. Diederich 



3. starts at a bad point; he should first point out what these passages 
say in favor of* his position. 

4. starts at a bad point; he should first tell what each passage said be- 
fore pointing out any conclusion that they hold in common. 

4. In the light of this purpose, sentence 14 is 

^ 1. the logical conclusion toward which his whole argument is di- 
rected. 

2. one of the major reasons on which his conclusion is based. 

3. only a restatement of his conclusion in slightly different terms. 

4. irrelevant to and inconsistent with his conclusion. 

5. The student tries to show that "success in work is important** by 

1 . first refuting the objections of the three passages and then building 
up his own case. 

2. misrepresenting the arguments of the passages and then refuting 
them. 

3. overlooking or misstating objections and then asserting and quali- 
fying his view. 

4. the propaganda devices of name-calling, begging the question, 
exaggeration, and reiteration without proof. 

6. The studeni misinterprets at least one point in his summary of each 
passage, but everything he says about one passage is a misinterpreta- 
tion. Which passage is that? 

t ) Passage I 2) Passage II 3) Passage III 

7. Ai what point in the paper does the student's development of his own 
position begin? 

1 ) Sentence b 2) Sentence 7 3) Sentence 1 1 4) Sentence 14 

8. There is one logical argument in support of the student's conclusion. 
In which of the following sentences is it stated? 

1 ) Sentence 6 2) Sentence 7 3) Sentence 1 1 4) Sentence 14 

9. Which of the following is the best comment on the student's argu- 
ments in support of his conclusion? 

K They are true as far as they go. but the argument is incomplete. 

2. They are repetitions of his conclusion in different terms, not argu- 
ments to support it. 

3. They sound plausible but commit many logical fallacies. 

4. There are about twice as many statements opposed to his conclu- 
sion as there are in favor of it. 



Measuring Growth in English 71 

10. In sentence 1 , 'Hise others as stepping stones** was probably suggested 
by 

1. the remarks about the **sleepers'* in Passage 1. 

2. a misinterpretation of what Passage II means by "the moving 
finger." 

^. the reference to Solomon in Passage III, 

4. nothing that is stated or implied in any of the passages. 

1 1. Sentence 2 is 

1. intended to summarize the positions of the three passages. 

2. intended to state the student*s own position. 

3. intended to state a point on which the passages and the student 
agree. 

4. not clear as to which position is intended. 

12. Sentences 7-10. This paragraph 

1. is a fair statement of the main point at issue. 

2. misses the point, which is whether even honest success in work is an 
essential clement of a good life. 

3. misses the point, which is whether individual success makes for 
social progress. 

4. misses the point, because none of the pass-ges mentions "conceit." 

13. Sentence 11 is 

1 . goixi. because it gives a reason for regarding success in work as im- 
portant. 

2. good, because it answers the objections raised by Passage I. 

3. poor, because the last word, somethiufi, is vague. 

4. poor, because no one needs to be told why success in work is impor- 
tant. 

14. Compare sentence 2 with sentence II. 

L The student is inconsistent in these sentences. 
2. The student is consistent because these sentences mean the same 
thing. 

X The student is consistent if sentence 2 refers to views stated or im- 
plied in the passages while sentence 1 1 refers to the student's own 
position, 

4, Fven so. the student is inconsistent because success is not the same 
thing as living a good life. 

Din ctiims amtbnwiL Items 15 to 20 are concerned with precision and ac- 
curacy ot expression. Since we have already read and can refer to the pas- 
sages that the student is trying to summarize, we can judge which answer 
to each of these items gives the most accurate interpretation. 



ERIC 



72 PaulB. Diederich 



15. Sentence 3: because thousands of 

1 . people are often hurt 3. investors are defrauded 

2, workmen are injured 4. lives are used up 

16. Sentence 4: Passage II says learning is 

1 ) important 2) vital 3) insuftlcient 4) useless 

17. Sentence 4 (after the semicolon): it also says that 

1. ifyou're going to do anything, don't do something you'll regret, for 
what's done can't be undone. 

2. if you have to decide on a course of action, be very careful, because 
one mistake can ruin you. 

3. striving to accomplish anything is futile, because everything that 
happens is determined by fate, 

4. life should be devoted to pleasure, because it will end soon enough 
anyway. 

18. Sentence 5: food and clothing are nothing compared to 

1. everlasting life. 3. the birds and the lilies. 

2, health and success in life. 4. the service of God. 

IQ. Sentence 6: there is no happiness in 

1. gains made crookedly. 3. material wealth. 

2. ill-gotten gains. 4. the fruitful grape. 

20, Sentence 16. Which of the following endings of this sentence comes 
closest to what the student probably meant? 

1. and it takes up all your time. 

2. and it does not take up all your time. 

3. and if you let it take up all your time. 

4. and if you don*t let it take up all your time. 

Answers 

The three passages: 1-3 2-2 .V4 4-3 5-4 6-1 7-1 8-1 9-1 10-4 11-3 
12-2 13-2 14-2 15-1 16-2 17-3 18-4 19-4 20-1 



The student paper: 1-3 2-4 3-2 4-4 5-3 6-2 7-2 8-3 9-4 10-1 11-4 
12-2 13-1 14-3 LS.4 16-4 17-3 18-4 19-3 20-4 



Measuring Growth in English 73 



A Related Writing Asst(;nincnt 

Passage III is from the New Testament in the King James translation of 
the Bible, and it has always made thrifty Christians uncomfortable. The 
injunction that is hardest to take literally is 'Take therefore no thought 
for the morrow." How can we reconcile this advice with the following pas- 
sage from the Old Testament in the same translation of the Bible? 

Go to the ant, thou sluggard; 

Consider her ways, and be wise: 

Which, having no guide. 

Overseer, or ruler 

Provideth her meat in the summer 

And gathereth her food in the harvest. 

How long wilt thou sleep, O sluggard? 
When wilt thou arise out of thy sleep? 
Yet a little sleep, a little slumber, 
A little folding of the hands to .>leep: 
So shall thy poverty come as a robber. 
And thy want as an armed man. 

Write a paper in which you explain and, if possible, resolve the seeming 
contradiction between these two passages. You may approach this task in 
any way you like, but it may help you to get started if you consider the fol- 
lowing suggestions. First, you might explain what the apparent contradic- 
tion is. and show the dilemma in which a devout believer is placed. Then 
you might write a careful explanation of what you think these passages 
mean, supporting your interpretation with relevant quotations. You might 
examine the case for the "ant." then the case for the "lilies/' giving rea- 
sons for acting in accordance with each position, and then showing what 
difticulties an extreme adherence to either position would entail. Finally, 
you might try to work out a resolution of the conflict: either a way of rec- 
onciling the two positions or some middle ground between them that you 
would regard as a tenable position. Remember that both passages are 
tnmslatitms. first published in 161 1. The words are not those of the origi- 
nal writers: some expressions may have changed their meaning or conno- 
tations in the centuries that have gone by since this translation was made; 
and even as they stand, these passages may be interpreted in different 
ways, 1\) show you how widely scholars ditTer in their interpretations of 
these texts, here is a recent, authoritative translation of the last paragraph 
in Passage \\\\ "So do not worry about tomorrow: tomorrow will take care 
of itself. Hach day has enough trouble of its own/* 

We hope this assignment will not offend either devout believers in the 
Bible or followers of the other great religions of the world. It is not our 



74 Paul It, DieJerich 



purpose to show that the Bible offers contradletory advice. On the con- 
tnirw we believe that a careful interpretation of these passages will reveal 
no contradiction but only a difference in emphasis: a difference that exists 
among the followers of all religions. 

You need not worry that a recent decision of the Supreme Court of the 
United States forbade compulsory reading of the Bible as a devotional 
exercise in public schools. The same decision explicitly permitted and even 
encouraged voluntary study of the Bible as literature* philosophy* or his- 
tory. Here the purpose is literary: the interpretation and comparison of 
two passages of singular beauty. 

A Comment on This Assignment 

It is not necessary for the essay topic to be as closely related as this to the 
objective exercise, nor is this a common practice in college examinations. 
Indeed, if the objective sections consist of discrete items, unrelated to any 
central theme, as is usually the ease, no such connection in thought is pos- 
sible. But if you and your colleagues go to the trouble of preparing objec- 
tive exercises on interpretation and criticism that are unified around a 
single tt>pic i)V problem (along the lines of those you have just seen), you 
will naturally want the essay written in this session of the examination to 
deal with st)me aspect of the same theme. The students will be **warmed 
up'* by answering questit)ns on one or more passages dealing with this 
theme and on a student paper based on the passages. By that time they 
will have given a gtnul deal of thtuight to the topic and will probably have 
generated some ideas of their own that they would like to express. 

After all. it is somewhat unnatural and artificial to assemble a group of 
students m a given day and ask them all to write a paper about some un- 
expected ttipic that they may never have thought about before. We have to 
i\o it because, if we anmninee the tt^pic several days in advance in order to 
give them time lo study it and think about it. they may get varying 
amounts tit help fnmi their u;mily or their friends. Keeping the topic a 
secret until the examinatitm begins is the only way to make sure that each 
paper is the student's own unaided work. It is not wholly unreasonable, 
because this is uol a test o\' creative writing; it is a test of ability to write 
stimething coherent and sensible on demand, as those of us who work in 
offices have io do every day. Still, it takes students some time to generate 
ideas about an unexpected topic; \o discard those that, after considera- 
tii>n, seem irrelevant, inconsistent, or indefensible; and then to arrange the 
rest in a Uigical and effective tirder. It is no wt)nder that nu^st oi them do 
not write as well in this situation as they do on papers written at home, to 
which ihcy have devoted a gooti deal of time and thought. Only after the 



. I 



Meusuring Growth in English 75 

examination do many of then) think of all the good thing's they ought to 
have written. 

Whether or not this "wanning up" makes enough ditTercnce to justify 
the time atid work involved in preparing sueh unified examinations, the 
foregoing assignment illustrates the sort of extended assignment with a 
good deal of "stimulus material" that is often used in college examina- 
tions. You can see how much more food for thought it provides than the 
brief topics listed on pages W)-bl. 



Vnaihuhny. The most common type of vocabulary item was illustrated at 
the top of page M: th«r word to be defined is underlined and is followed by 
a choice of three t»r four defining words and phrases. Hvery word in these 
definitions should be more familiar than the word to be defined. Both 
Hdyar Dale and 1. who have made extensive studies of the familiarity of 
I-nglish words to American students, have found that three-choice vo- 
cabulary items work as well as tour-cht)iee. The greater element of chance 
in the three-choiee item is offset by the larger number of responses one can 
get per unit t)f time. .Some teachers have the idea that all the chtiices must 
be single words. .Such a restriction is pointless; 1 prefer several words as in 
the definititm oicxphit on page .17; '-make use of for one's own benefit." 
Here are stinie other common types of objective vocabulary items: 

CnnipU'iiims. Which pair of words best fits the meaning of this sentence? 

Fn)m the start, the islanders, despite an outward did what 

they could H) the ruthless occupying power. 

1. harmony, assist 2. enmity, embarrass 3. resistance, destroy 
4^ acquiescence, thwart 

Opposiii s. Which t)f these is the opposite of the italicized word? 
clironic: I. slight _2s. temporary X wholesome 4. patient 

Anaiofiir.s. Which pair of words is related in the same wav as frif>f>i-r 
hullit! 

1. handle: drawer 2. holster; gun .1. bulb: light A. switch: current 



ERIC 



76 Paul 0. DiedvrU h 



Ri}ihf'\yi^mf{ svHfvncvs. Murk each sentetKV R (right) If the itttlici/cd 
word is used currcctlv; W (wrong) if* it is used iueorrectly, 

1 iuihov you to treat the matter eontldentially. (R) 

A barely culfHihlv heartbeat showeii that the vietim was still alive, (W) 

listvnwfi ctmifnv/MisitiH. Listening eoniprehension passages and i^ems 
iwv prepared in the same way as reading eoniprehension passages and 
hems, and we have seen plenty of examples of the latter on pages b3-ft8, 
The main differenees are that the passages (which are read aloud by the 
teacher) should be material of a sort that is normally listened to rather 
than read: stories, conversations, lectures, directions, short and relatively 
simple pwms, etc. Ilie test booklets that students mark have only the four 
answers to each question, but not the questions themselves, which are read 
aloud by the teacher. For example, the first story in a test of this sort is 
about an eagle and a fox. The first item in the test booklet has only this: 
I ) Looking for food. 2) Sleeping on a nK*k. 3) Trying to hide from the eagle. 
4) Drinking from the stream. These make no sense untd the teacher fin- 
ishes the story and reads the tlrst question: What was the fox cub doing 
when the eagle saw it? Then it is clear that the correct answer is 2) Sleep- 
ing on a rock. T\\\s device keeps the students from marking their answers 
during the reading of the passages. 

l:Hf>lish usufic. xcmcHcv xtruciun*, ami puHCiiHUion, There are innumera- 
ablc ways of testing students* knowledge of the rules and conventions of a 
language and no clear-cut superiority of one way over another in terms of 
correlations with carefully determined grades on samples of the students' 
own writing. I used to use student papers with a large number of errors, 
including some that 1 inserted myself. These were printed in the left-hand 
column of a divided page with certain portions underlined or enclosed in 
brackets. Opposite each marked portion were from two to four ways of 
writing, arranging, or punctuating it, always starting with the one that 
appeared in tlie left-hand column. To keep students from assuming that 
this first choice was always wrong. I would sometimes put the best choice 
on the left side and transfer what the student had written to the right-hand 
ci>lunin as one of the choices. Sometimes the intended answer was to trans- 
fer that part of the sentence to some other place; sometimes it was to omit 
that part entirely. Although this was a realistic way of testing correctness 
of expression, since it virtually duplicated the act of proofreading. I was 
never able to prove that it yielded results that were superior to tho;,e of 
other item-types that were easier to prepaa* and assemble. By using actual 
student writing. I w as stuck with whatever errors a particular student hap- 
pened to nuike. plus a few that 1 inserted, and these might or might not 
reflect the weaknesses of the class or the rules we had been studying. 



Mimurmg Growth in English 77 



I therctbrc abandoned this ctlort at realism in testing and substituted 
discrete items in which no sentence had any connection in thought with 
any other sentence, 1 had a long list of the most common errors in the writ- 
ing of American students that persist through the freshman year in college 
(age \Hl I embodied each error in a sentence and broke up the sentence 
into three lines of about equal length, making sure that the whole error lay 
within one of the three lines. The directions were simply to mark the line 
thai contained an error or 0 if there was no error. One cm test the ability 
\o detect almost any type of error in usage, word choices, sentence struc- 
MV, and punctuation in this format. At first I included spelling errors, 
but even good students and teachers tended to overlook them in this type 
of test; they were looking for bigger game. Hence I cut out the spelling 
errors and made separate spelling tests of 1(X) words each, about half 
spelled correctly and the rest incorrectly, to be marked R (right) or W 
iwriMig). 

Here are just a few examples of the three-line sentence item-type: 

1 . She asked whether 1 . His last address 

2. we would be ready ^ was seventy-four 
to leave by noon? 3. Poe Lane, Albany, 

1. Last Saturday Chester and I, "Please don't do 

2. Bud went fishing and ^ that", said Mary 
^ brought back ten of them, X to her sister, 

1. She is t)nc of those rare 1. If I had known that the 

2. women who never cares about 2. assignment was important, 
.V wearing stylish clothes. ^ 1 w ould of done it quickly. 

It is obvious that such items are easy to write, assemble, reproduce, and 
score. They approximate the act of proofreading one's own work, since 
there are no marked portions draw ing attention to possible errors, and one 
is not \ok\ what kinds tif ern^^-s to look for; one has to be ready for any- 
thing. Such items i\n \mm test the ability to correct such errors or avoid 
them in onc\ own writing, but students who arc good at detecting them 
tenti :\\so lo be good at correcting and avoiding them. If my memory is cor- 
rect, this item-type was first suggested by S. Donald Melville when he was 
the director t)f the Cooperative Test Division of HTS. It makes the work of 
preparing objective tests o\' Llnglish usage a great deal easier than any 
other item -type I have used ior this purpose, and it works as well as any 
other. 



78 Paul B. Diederich 



Common Errors in Usage and Sentence Structure 

In a tryout of 580 items of the three-line sentence type in secondary 
schools, I found that the items most frequently missed (marked incorrect- 
ly) could be dassitied under the following 20 headings. When the name of 
the error is universally understood by teachers of English, I give only the 
name: otherwise I give a brief statement of the rule that was violated, 
sometimes with a warning that modern linguists and editors accept certain 
constructions that were formerly regarded as errors. 

1. Sentence fragment, incomplete sentence (if unintentional) 

2, Comma splice, fused sentence (main clauses joined only by a comma 
without a conjunction, or by nothing at all) 

X Run-on or strung-together sentences (more than two main clauses un- 
less they are short, of the same pattern, or separated by semicolons) 

4. Carelessly omitted words or parts of words, especially endings 

5, Careless or needless repetition 

6» Adjective for adverb and vice versa 

7. Confusion of subject and object forms of six pronouns, /. fw. shi\ 
thiy. who. Many linguists accept who as an object form, especially in 
questions, but whom is not accepted as a subject form. 

8. ShulhwilL shouUl'Would. The rules governing these word choices are 
so complex and so rarely mastered that some linguists advise using 
will and would regularly: should only in the sense of ouffht to. In cur- 
rent American speech, n r// occurs 217 times for every shall: wndd 
nine limes for every should. British usage differs from American on 
this point and uses shidl and should more frequently. 

9. Subject-verb agreement, especially after there and after a compound 
subject joined by imd or or. Speakers of some American dialects often 
omit final -s in writing because they neither hear it nor pronounce it. 

10. Indefinites such as au\om\ anybody, someone, everybody, each, 
either, neither ami none tak*^ a s'mgulixv verb and following pronoun if 
the meaning permits; but none and neither are often plural, and 
sometimes both singular and plural follow, as in '•Everybody was 
there, but they have gone home," and *'lf anyone calls, lell them to 
call back." 

11. Pronoun^anteeedent agreement: two antecedents with and usually re- 
quire the plural; with or the pronoun agrees with the nearer antece- 
dent. 

12. Pronoun reference: what a pronoun refers to should be clear from the 
sentence structure, meaning, or context; but it, this, that, and whieh 
may refer to the whole preceding clause if* no ambiguity results. 

\X Tense: wrong form, improper sequence, needless shift. 



Measuring Growth in English 79 

14. Parallel structure: sentence elements having the same function should, 
if possible, be parallel in form (e.g., not a clause, a gerund, an infini- 
tive, and a noun as members ot the same series), 

IS Misplaced modifiers (especially dangling participles and only) should 
be counted as errors only if they appear to modify something they can- 
not logically modify, often with ludicrous effect. 

l(t. Abbreviations: the safest rule is to avoid abbreviations in sentences 
except \h\. Mrs,, Ms,. Dr,, St, (SuiHil a,m,. and p.tu,\ Hon, and Rvw 
may be used only when the first name, initials, Mr, or Dr, precedes the 
surname, 

P, Contractions (such as lUm't) are permissible in anything less formal 
than a dissertation, but some students have to be cautioned against 
excessive use of them, 

18, Possessives: omitted or misplaced apostrophe; hvrs, ifs. yours, 
theirs, and who's are incorrect. There has been a long controversy 
over whether the possessive should be used before -ing forms, but our 
editiirs now tend to accept either "I'm surprised at his saying that" or 
"Pm surprised at him saying that." 

19, Numbers: some publications now use figures for even small numbers 
like 2 or but most prefer writing out numbers in sentences unless 
nn)re than two words are required, unless several numbers occur in 
the sanK* sentence, and unless they are pages or divisions of a book, 
street numbers, dates, and time of day if followed by a,m. or p,m. 
Numbers like SIO million are now common, A number beginning a 
sentence nuisi be written out, 

20, C'aphals: although usage varies, we generally capitalize names of per- 
sons, places, languages, organizations, days, months, holidays; histor- 
ical periods, events, or documents; titles before names; first word and 
all others except articles, prepositions, and conjunctions in titles of 
publications and papers written by students (but not always in biblio- 
i»raphic entries), first word in every line of poetry (or as printed); first 

in every s«.'ntence inchuling quotations and inserted statements, 

The rcinaining types of three-lino sentence items that gave American 
students the most tumble were connected with the use of the following 
punviu.ition marks: cmnnuu dash, semicolon, cohm. question mark, apos- 
iri^pho. ellipses, quotatioti marks, and breaks within quotations. I ox\\\\ all 
crriws in word dunces, since there are \w many to flassify. 

M\ final wi^rd of advice %m\ such tests is not to despise them. Objective 
lest items can easily, quicklv and reliably test a student's knowledge of* the 
rules and ciunciuions i>fl-nglish. In writing, if a student is not sure that he 
knows h(w\ to use a certain construction, he can change his sentence to 



ERIC 



80 Paul B. Diederich 



iwoki using it, but in an objective test, you can give him a sentence with 
that construction in it, and he has to decide whether it is correct or incor- 
rect. Arguments over whether answering objective items •'is the same thing 
as" actual writing or speaking are futile. For that matter, writing one essay 
is not ''the same thing as * writing another essay, even on the same day: we 
have seen that even the most carefully determined grades on such essays 
raa*ly correlate higher than .70. Then the really astonishing thing is that 
scores on a good objective test of English usage often correlate about .70 
with averages of the two essay grades. It does not matter that they do not 
'•really** measure '•the same thing." If students who are good at one also 
tend to be good at the other, and vice versa, then it is a good intiicator of 
proficiency in written English. Call it an editing test if you like, but I can 
promise you that students who do well on it also tend to he good writers. 



Measuring Growth in English 81 



A Short Test of Knowledge of Grammar 

Although I have frequently inveighed against the teaching of English 
grammar, since most students refuse to learn it, and research in several 
countries over a Umg period of time has shown little, if any, connection be- 
tween any type of grammar, traditional or modern, and improvement in 
writing, I have to admit that most teachers of composition devote an in- 
ordinate amount of time to it, 1 often suspect that they run away from the 
problem of teaching writing and teach grammar instead. Wondering how 
this time could be shortened, 1 wrote out the rules governing standard 
usage in the most common types of errors (described in the last section) 
and counted the number of technical grammatical terms that I had to use 
in stating them. I found that I could get by with forty, which I arranged in 
five grouj)s as fi^llows: 

L active, passive, linking; subject, verb, object, complement: helping 
verb 

2, phrase, clause (independent^ subordinate, cwrdinate); simple, com- 
pound, complex 

X noun. pri)nuun. udjcciivc, ailverb, preposition* conjunction, article, 
interjection 

4. singular, plural, possessive; tense, perfect: modify, agree, apposition 

5. number, case, person; intlnitive, participle, gerund; conditional, 
parenthetical 

Some linguists insist that there are only four parts of speech, but they 
treat pronouns as a subclass of nouns; they begin talking about preposi- 
tions when they get to phrases, and conjunctions when they get to clauses; 
and they call articles "determiners," but I can see no advantage over the 
familiar term. I includal hiicrjvaion only because I had to use it in the 
rule about setting it t)ff with a comma or exclamation point. 

Maiiv linguists treat the passive as a transformation, but in my experi- 
ence young students do not grasp it unless it is included in the list of basic 
sentence patv^rns. The term **transitive,** however, seems to me to make 
nuMV tn>uble than it is worth, and 1 dtuibt that young students need to dis- 
tinguish direct and indirect objects. When a sentence contains both, I de- 
scribe it as subject verb t)l\jcct object. 

Some teachers may want add a few terms to niy list, but 1 doubt that 
anytmc utuild really luvd nuu'c than fifty. The quickest way I know to find 
tujt uhcthcr students can use such terms in describing a sentence is illus- 
trated by the fi)llowing test. 



ERIC 



82 Paul B. Diederich 



The"S 

tHrectums: Encircle the number of 
the best answer to each question. 

The test is based on one sentence: 

I have a little shadow that goes in 
and out with me and what can be the 
use of him is more than I can see. 

1. This sentence may be hard to 
read because one comma has been 
left out. Where would you put a 
comma to break up the sentence 
into two main parts? 

1. After shadow 

2. After me 

3. After him 

4. After more 

2. What kind of sentence is this? 

1. Simple 

2. Complex 

3. Compound 

4. Compound complex 

3. What is / have a little shadow! 

1. The subject of the sentence 

2. The first independent clause 

3. The first subordinate clause 

4. The subject of him (line 3) 

4. What is that goes in ayid out unth 
me'! 

1. The first independent clause 

2. A subordinate clause, object of 
have 

3. A subordinate clause modifying 
shndinr 

4. A subordinate clause modifying? 

5. What is and! 

\, A roordinatinK oonjunction 

2. A subordinating conjunction 

3. A relative pronoun 

4. A preposition modifying what 



iw"Test 

6. What is aiid what can be the use 

of him'} 

1. The second independent clause 

2. A subordinate clause modifying 
ahadow 

3. A subordinate clause, subject of 

is 

4. A subordinate clause, subject of 

see 

7. What is than I can see? 

1. The second independent clause 

2. A subordinate clause, object of is 

3. A subordinate clause, object of 

more 

4. A subordinate clause modifying 

7nore 

8. What is is? 

1. Verb of second independent 

clause 

2. Verb of second subordinate 
clause 

3. Verb modifying 7nore 

4. A verb that does not have a sub- 
ject 

9. What is more? 

1. A coordinating conjunction 

2. A subordinating conjunction 

3. An adverb modifying than I can 

4. A linking verb complement 

10. What is the subject of the first 
independent clause? 

1. / 

2. shadow 

3. / have a little skadouf 

4. that ijaes in and out with me 

11. What is the subject of the sec- 
ond independent clause? 

1. shadow 

2. that qnvs in and out u^ih me 

3. u hni ran ht the usr of him 
J. mon than I ran se^ 



'J-.." . . 



12. How many subordinate clauses 
are there in this sentence? 

1. One 

2. Two 

3. Three 

4. Four 

13. What is the subject of the first 
subordinate clause? 

1. shadow 

2. that 

3. what 

4. more 

14. What is the subject of the sec- 
ond subordinate clause? 

L what 

2. use 

3. him 

4. more 

15. What is the subject of the third 
.subordinate clause? 

1. There is no third subordinate 
clause. 

2. what 

3. use 

4. / 

IH. What is the verb of the first in- 
dependent clause? 

1. have 

2. govs 

3. ran he 

4. ran Her 

17. What is the verb of the second 
independent clause? 

1. (fnrs 

2. rtiTt he 

3. is 

4. raff see 

IH. What is shiuhiri 

1. Subjert of the whole .sentence 

2. Object of haw 

3. A linking' verb complement 

4. Object of the preposition little 

19. What are />/ and out^! 

1. Prepositions 

2. .Xdverbs 



Measuring Growih in English 83 

3. Objects of goes 

4. Adjectives modifying vnth me 

20. What does tvitk me modify? 

1. shadow 

2. have 

3. goes 

4. 171 ami out 

21. What is whaVi 

1. A relative pronoun 

2. An interrogative pronoun 

3. An indefinite pronoun 

4. A personal pronoun 

22. What is of him'! 

1. Object of the verb use 

2. Prepositional phrase modifying 

use 

3. Prepositional phrase, subject of 

is more 

4. Prepositional phrase modifying 

can be 

23. What is than? 

1. A coordinating Conjunction 

2. A subordinating conjunction 

3. An adverb modifying can see 

4. A relative pronoun, object of 

can see 

24. Can be is a different form of the 
same verb as 

1. have. 

2. tfoes, 

3. Is, 

4. ran see, 

25. What is ran m can be and can 

1 . An adverb 

2. An auxiliary 

3. The subject 
1. The object 

2H. The subordinate clau.ses in this 
sentence have three v,f the following 
fum'tions. Which one do they not 
have? 

1. Noun 

2. Verb 

3. Adjective 
1. Adverb 



84 Paul B. Diederich 



Here is the sentence again: I have a little shadow that goes in and out with me 
and what can be the use of him is more than I can see. 

Rewrite this sentence in as many of the following ways as you can. Use the 
same words that are in this sentence but change the form and order of these 
words as required. Try not to change or omit any of the ideas expressed by this 
sentence. Each rewritten version should be a single complete sentence. 

27. Start with / had a little shadow. 



28. Start with I cannot see the use. 



29. Start with The children had. 



30. Start with Do you have. 



Start with What can be the use. 



32. Start with Going in and out mth me. 



33. Start with More than I can see. 



34. Start with Go in and out. 



Learning to Write 

I do not want to end this booklet with treatments of mechanical errors 
and grammatical terms, because teachers devote altogether too much time 
to them already. To give a broader view of what students need to learn 
about writing— at least by the end of the freshman year in college — I have 
decided to conclude with a list of ninety-six things that I have tried to 
teach in one way or another: by direct instruction, by comments on 
papers, and in conferences with students. They may be regarded as an ex- 
tended list of objectives, but I wanted my students to read it so that there 
would be no mystery about what I intended to teach. Hence I could not use 
the maddening repetition of ''Ability to . . . Ability to . . . Ability to . . 
nor the form of statement advocated by Magers and others: "Given a set 
of twenty sentences, students will indicate which ones contain colorful 
words or expressions with not more than four errors." Even teachers 
would refuse to read ninety-six statements of that sort. I therefore decided 
to state my goals in the form of advice to students on learning to write, 
with as much variety of statement as possible. I began with the following 
paragraph to show that 1 did not expect all students to follow all these in- 
junctions all of the time: 

"No general statement about writing, including this one, is 100 percent 
truj, 'I'he following statements are probably true of 10 to 90 percent of 
good writing. They are no less useful because they are not universally true. 
What even 10 percent of good writers do most of the time, or what all good 
writers do even 10 percent of the time, is likely to be suggestive and 
helpful.*' 




85 



86 Paul B. Diederich 



A. The Writer 

K Students should form a definite and serious intention of. becoming good 
writers, fully realizing the difficulty, the feasibility, and the value of the 
enterprise. They should not take this intention for granted. They should 
consider the question seriously and at length, make up their minds delib- 
erately, and mark their resolution by some outward act. Ir may be neces- 
sary to start from a conviction of sin: an awareness of the limitations of 
their present writing, and a deep concern about it. At the other end of the 
scale ihey should recogni/e excellent writing when they see it and wish to 
emulate it. 

2, Students should feel a glow of exultation when they have written a good 
phrase, setitence, paragraph, or paper. They should care enough about the 
quality of their writing to spend the time necessary to do a good job. They 
should realize that practiced writers will gladly spend an hour or more 
over each page. 

,V Writers must be willing to throw away hard-written paragraphs or 
pages, even though they are clever, once it becomes clear that they do not 
belong. They must cultivate the art of waste-basketry, 

4. When siudciUs have to write siuneihing, thc\ should set about it 
promptly* with confidence that they can do it well. They should not post- 
pone the task indefinitely because they feel that they ''can't write." 

5. The tlrst step in writing is to think about the problem or topic — not to 
begin writing anything that comes to mind* not to search through books 
for an idea, and no\ to run away from the problem and write about some- 
thing else. Fifteen minutes o\ honest thinking about any problem will 
usually yield st)me idea about it that is worth writing down. The way to 
interest pet)ple is to have an idea. 

(). The ideas about a pn)blem or tt)pic that occur ♦oone in the process of 
thinking about it are the on\\ things worth writing down — not what some- 
one else has said abt)ut it, what pet)ple usually say about it* or what you 
think the teacher wt)uld like ytui u^ say about it. Information about a topic 
should never he used in place t)f an idea; it should h\: used only to support 
or illustrate an idea. Students sht)uld not be dismayed if the ideas that 
occur tothetu k\o not solve the whole problem, and they should not expect 
to present very many or very important ideas. One small idea per paper is 
alMAc average. 

Stutlents sliould be cautious about transferring to a new problem the 
thinkiiiii ihev have done about a previous problem. It is well see rela- 
tionshi{>s, Inn w^A tt^ save wear and tear mi the brain tissue by using an old 
ulca ovlm* again. In toti many cases the t^ld iciea does not really tit the new 
problem. 



Measuring Growth in English 87 

«. Writiny sHduKI yivc assurana' that the writer is capable of looking a 
tact in the taee. oftakiii}. a definite stand, of telling the truth rather than 
what ho thinks people will like. It should not leave the impression that the 
writer uaiUs above everything else to avoid trouble— even at the cost of 
saying nothing. 

The most tiresome writing in the world is that which tries to protect it- 
self from every possible attack by putting in every possible exception, dual- 
t ication. and condition. It is like the aged spinster who still looks under 
the bed— but no man is sufficiently interested to hide there. 

«0. Writing should cut through the obvious, conventional, easv thing to 
say to the real issues underneath: to true feeling, fresh perception, inde- 
pendent thinking, on however humble a level. Pretentious writing is the 
most likely to miss this quality. Writing should mean something, not just 
mouth words. 

1 1. Writers should be willing to reveal themselves, not as thev would like 
to he. but as thev are. confident that qualified readers will understand and 
be interested. The model to imitate is the honest candor of a conversation 
iH'tween friends. 



B. The Whole Paper 

12. A paper ought to have a plan that will be apparent to the discerninvi 
reader. 

I.'. .A |)aper ought to have one central purpose, point, or idea, which ne 
sh;.ll refer to hereafter as the •'theme." The student should consider ve-v 
carefully «hat he uants to accomplish: what impression or conclusion he 
w ishes to loaNe with the reader. In the beginning he should practice formu- 
lating his central point or purpose in a single sentence and writing it down. 
N. Tin- title should be related to the theme. It should delimit the field of 
» K- paper as sharpK as possible without sacrificing other desiderata M 
should be brief, and the words chosen should be in kecpinu with the tone 
of the paper. It the subject warrants it. tlie title mav be arresting-but 
young writers strain too luird to make it arresting. 

15. AiKirt Jn>ni the intnuluction ami conclusion, there shouUl rarely be 
more tlian three ..r four ni.iin diNisions in the short papers that students 
•M-ite. I he student should list the points he wants to cover, eliminate those 
I bat are not essential t.. the theme, and gnnip the rest under not m.)re than 
ihive or lour main headings. He sh.)ul.i m k the points he wants to cm- 
pIuM/e and o.nsulcr what point will tuinish the best entrance into his 

SObKVl. 



88 Paul B. Diederich 



lb. luich main point should be clearly related to the theme, and should re- 
veal the way in which it is related: e.g.. as illustration, proof, application, 
etc, 

P. The points in a paper should be arranged in the order that tits best (a) 
the purpose in writing, (b) the logical requirements of the subject, and (c) 
the requirements ofthe audience—what they already ktiow, what they will 
accept without question, what they will oppose, critici/.e, or misunder- 
stand, and what will move them most powerfully. 

18. There should be a clearly marked beginning and endii 

The beginning should (a) be clearly related to the theme, (b) catch the 
reader's interest, (c) show that the topic deserves consideration; that it is 
interesting, important, or timely, and (d) state or suggest the purpose, 
scope, and general method of organization, 

20. The paper may begin with a direct reference to the title (never with 
''\hW* or ••it" intended vaguely to refer to the title), with a statement or 
quotation bearing on the subject, with a pertinent narrative, w-ith back- 
tjnmnd inforniiition, w ith an explanation of the timeliness or importance 
of the topic, or in other ways too numerous to mention. One writer sug- 
gests: "A paper that begiijs on a moralizing tone will never come to any- 
thing." 

21. The paper should stick to the scheme of organization staled or im- 
plied in the beginning, or to the underlying pattern of organization, even 
when it is mn indicated in advance, A paper should not start out as one 
thing and then turn into something else— except for good reason, and with 
appropriate indications t^f the shift. A combination of two types of organ- 
ization, however, is not necessarily inconsistent: cause and effect, for ex- 
ample, frequently requires a chronoU^gical organization as well. 

22. Some iA' the coninuMi niethiKls of organization are by time, space, 
cause and effect, familiar to unfamiliar, classification, division, definition. 
ct>mpariv)n and ccmtrast, analogy, the order of impressions, the order of 
climax, eic. These are not the iMily possible types of organization. 'I'hey 
rarely exist in a pure torni: most actual schemes of organization could be 
described onl> in terms of two or nn)re of these headings. 

2.V Siudents should be able to t^rgani/.e the same material in different 
wa\s to suit different purposes, (occasions, or audiences. 

24. Within a chnmoltigical organizatitm it should be noted that usually 
e\cnis caumil be related in a strict lime sequence withtnit ctmfusing two or 
mow irains of events. One train should be followed to a convenient break 
in the narrati\e before starling another. 



Mi'usuring Grtmh in English 89 



2\ A sturv should be told from a consistent "point t.rvi.»« .* using only 
events that could have been observed from that poini of view. If other 
events are necessary to the story, the observer should have sonic plausible 
way of learning about them. 

2b. A long paper may need to be enlivened by chant»es of pace: e.g.. by 
examining some parts slowly and anal>lically. then quickly sketchiiig out 
several others that present no new problems, etc. 

27. Students shtnild be able to clarify and illuminate an abstract discus- 
sion by the use of analogy without relying upon it as proof. 

2H. .Students should be able to write an accurate literal detinition. without 
circularity, and toe.vpand the meaning of a key term or concept bv an ex- 
tended detinition. developed by classification, function, distinctions, 
historical causation, etc. 

2«). The most important parts of the paper should be treated at greatest 
length or with the greatest emphasis— by position, choice of words, or 
manner of st;nement. If necessary, one may say directly— in so many 
words— that a given part is important. The other parts should be treated 
in proportion to their importance, diftlculty. or interest. 

Ml The ending should (a) if necessary, recall the chief points that have 
been made, (b) state or suggest the conclusion that has been reached, the 
resolution of the contlict or problem, (c) (possibly) show some application 
ot this conclusion, suggest next steps, etc.. (d) point up or heighten the 
emotional and imaginative significance of what has been said, (e) show 
what has been said as one thing, even though it has been presented in re- 
lated pieces. .Sentimental and moralistic endings should be avoided. 



C. Paragraphs 

.^I. Paragraphs should be distinct, each dealing with a clearly separable 
phase of the theme, and unified, with every sentence clearlv related to the 
topic sentence (»r central idea, 

■U. Paragraphs should be joined by smooth transitions that indicate or re- 
tk-ct tlK relationship of the paragraphs to the central theme and to one 
aiiotiicr. 

^^. Transitions may be made by connectives, by direct statements of rela- 
tionship, hv repetition of key terms, and. above all. by a close connectitui 
in thought. 



«J0 Paul ft. DU'di'riK'h 



34. A paragraph should have u beginning and an end. and should move in 
an orderly fasliion between the two. The sentcnce.s should go from one 
consideration to another like a train of thought. The discerning reader 
should be able to see the connection between each sentence and the one 
preceding. He should never feel that a sentence should have occupied 
some other position in the paragraph. 

.^5. I he topic sentence or central idea of a paragraph may be developed by 
dctinition or explanation of ternts. by distinguishing it from some other 
idea with which it may be confused, by repetition with variation, by de- 
tails, instances, examples, comparison, contrast, analogy, proof, cause, 
effect, chronological development, and by other means too numerous to 
mention. The student should collect as many and as forceful details as are 
necessary to explain or support the central idea, in proportion to its im- 
portance in the paper as a whole. 

.V>. The development of an idea should include references to common and 
familiar things to make the thought clear and the emotion lively. The 
clumsy do it mechanically, tu-st stating an idea abstractly, then giving an 
example. The adroit can develop the idea concretely from the beginning. 



D. Sentences 

3"^. It is frequently said that a sentence should usually put the idea that is 
to be cmphasi/ed in the main clause, subordinate ideas in dependent 
clauses and modifiers. This rule is highly questionable. Note that the very 
sentence that states the rule does not obcv it. nor docs this sentence ot- the 
next, and none of them would be greatly improved by following this princi- 
ple. Perhaps a better rule is that the form of a crucial sentence should be 
so manipulated that the idea to be emphasi/.ed will come either first or 
last. 

^H. A sentence should tit snmothly into it- -.-ontext by its choice and ar- 
rangement of words. In a long sentence, the first part should grow out of 
the preceding sentence; the last part should lead into the following 
sentence. 

Ml I he ways in which scjitenccs are linked together, withtuit «)verworking 
iriic connectives like ' howocr" and •'therefore." is an impt>rtant and 
timc-eunsuming subject of qudy. The chief means is a close connection in 
thought. s(» that each sentence has some K>gical relationship to the sur- 
rounding sentences. Nt> new terms or ideas that are likely tt> be strange 
the rca«ler should be intrtKiuced witlnuit preparatitm oy explanatitni. A 
hdptul tievice is the repetition ot a key term, or synonym tor it. 



\\h I he HuuliM ii si»ii(eiKv imis in be *Mi)oso" in construciiuii. In lictioral. 
Miulchts should iroi try to imitate the "halaiieed" or **periiHlie*' style o! the 
se\eiiteeiiih eeiitory e>ieepi itt oeeasional seiiietiecs Uosi^iKHl for speeial et- 
leeis. Oti the other haiuL thev should he able U\ get iieeessary qualifiea- 
turns out ill the ua> hehire inakiiiM their ttuiin point, 

41. The strueture of a sciitenee shiudd be simple and easy to follow, A 
larye luiiiiber i^t sid>oriliiiaie elauses inav be used only when tliey all have 
the same pattern or tunelion (e.)»., '*l*hat man has had a liberal education 
uht) , . . » who ...» who ...» and who . . ."). Clauses subotnlinate to subor* 
dinale elauses slu^uld be used in nuuleration* and hai'dly ever a third order 
of subordination. 

4.*?, SubiMuiinale elauses should be introduced by eomteetives that clearly 
and correctly indicate tlie relationship of the subordinate idea to the main 
idea. 

4,^, One shouUl be able to write sentences m many forms to tit the mood, 
to make the meaning clear, to flow into the surrounding sentences* to 
make a pt)int stand out, I'hc lenj'tli, order, and pattern of successive sen- 
icnces should be varied except when repetition is desired for emphasis. 

44. A sentence usually consists of a sid^ject, verb, and (nuiybe) an object or 
complement. Mach of these elements may be nn)ditied by words, phrases, 
or clauses. Then there nmy be a cvMimui followed by "and." "i^r," or 
"bui/' a semictilon followed by a conjunction like ''thercKjre." or a semi* 
colon without anv other connective, riicsc may be followed by another 
subject, verb. :uid (maybe) an td\iect or complement, and each of these ele- 
ments may nuniitlcd by words, phrases, or clauses, as before. But tlien, 
except in mosi unusual circumstances^ it is well to stop. There should 
hardlv ever be three main clauses except when they are short and of the 
same pattern: "1 came; I saw; 1 ctnujuercd.*' or "He came, and we told 
him. but hewt)iilci mit listen." 

A>, Ant)lher limitatit)n in\ the Iciij^lh of a sentence is ifiat it should contain 
on\\ one idea. The itlea may have several parts, but w hen it becomes two 
ideas, it reipiires a sccoml sentence. In practice, of course, it is sometimes 
hard to tell where io tlraw the line, but criticism on this point will develop 
judument. 

4(>. On the tMher liaiuL a sivlc ctunpt^sed almost exclusively o\ very short 
semenees stnimls elu^ppv and immature. .Several atljaceni sentences of this 
sort aie usually related lo onv central itlea: one tells the cause, aiiolher the 
time, a third the etmsequence, etc. With practice. t)ne can furn most of 
these semenees info subordinate elauses or moditlcrs. 



ERIC 



^2 Nut //. Dii dmi h 



4"'. A sin thai is almost unforfiivablo in college is the joining of two sepa- 
rate sentences by nothiny but a conuna. that betrays an abysmal lack of 
•'sentence sense/' 

4M. rhe flow oi* thought within a sentence, except in unusual circumstan- 
ces tand for special effects, as in the wot*ks of Henry James), should not, as 
in this one» be interrupted (again and again!) by the insertion of too many, 
possibly unnecessary, parenthetical elements. 

4^. A sentence should come to the poiiit with reasonable dispatch* The 
necessary qualifications may be subordinated, buried in the middle, or 
added later. 

50. If a sentence lends itself to climactic order, the climax should not be 
spoiled by rcvcalitig the most powerful idea before the end, or by adding 
qualifying words and plirases after it. 

51 . A primary quality of good writing is energy — not to be confused with a 
facade of cxelamatiim points, violent language, exaggeration, etc. Wheth- 
er poised or exuberant, the sentences should have a go about them. 

52. Constructions within a sentence should be consistent with one anoth- 
er. I'hcre should be no unnecessary shifts in subject, voice, tense, person, 
or number. Phrases and clauses having the same function should usually 
be parallel in form. 

53. rhe reference of pronouns and of modifiers should be clear. When 
starling w hh a participle, it should not be left dangling, as in this sentence. 

54. In general, related words should be placed near one another. A good 
trick to learn, for example, is that of placing an adverb directly before or 
after the verb it modifies, whenever its normal position toward the end of 
the clause makes trouble with the following clause. 

55. .A sentence should mn ct>ntain any word that can be omitted without 
spoiling the intended effect. On the other hand, constructions must be 
ctunplcic: necessary words must not be omitted. "Of," "that," and the 
second member of a ctmiparison arc frequently omitted without justifica- 
tion. 

5h. A scnicncc should mn be so ambiguous that a qualified and wcll-dis- 
\u\svi\ rcatlcr will have any serious dtnibt as to what is meant. On the other 
hantl. the aiiempt io renuwc every possible ambiguity results in a tire- 
some, legalistic style. Precision should be sought only where it is impor- 
tant, anti to the degree necessary ihr the end in view. It is achieved even 
more in nianipulatiim ot the context than by choice ot words. 



Miuvwmifi Growth in t'nuluh 93 



v^"*. One should learn to use controlled ambiguity (a) to avoid unneeessary 
argument, (b) to arouse emotion, and (e) ti) enrich meaning* Perhaps its 
most ei^mnion use in daily lite is the "white lie" and the ''faee-saving tor- 
nulla:" At the other end ot'the seak\ something like the "Four Freedoms'' 
can command devotion where a bill of particulars would provoke dissen- 
sion, 

E. Words, Phrases, Figures of Speech 

5M. Words should be chosen with an eye to (a) dani\\ aiming at the degree 
of precision appropriate to the context; (b) appropriatiwss to tone and 
purpose; (c) cjlcctivcncss, using specific, vivid, forceful, or unexpected 
words at points of emphasis; (d) vuphony, avoiding words that are hard to 
pronounce together: unintended rhyme* alliteration, or assonance; and 
awkward, choppy rhythm. 

59, One should learn to use a few words in unexpected senses and con- 
texts that awaken a fresh perception of their meaning (e.g., a tine, Uufiv 
morning). A failure in this attempt is a malapropisnu but the risk is worth 
taking. 

hO. In general, little words are better than big words, but sometimes a big 
word is indispensable. 

hi. A word slunild not be repeated within or near a sentence except for 
good reason, such as clarity, emphasis, or connection. This rule does not 
apply to articles, prepositions, conjunctions, or pronouns. On a larger 
scale, a sentence sht>uld not gt) over the same ground twice. 

h2. Adjectives and iidverbs should be used in moderation. 

In general, active verbs are better than passive verbs, 

h4. One should avoid jargtui: words and phrases that mean nothing, un- 
necessary technical terms, and words too t>ftcn profaned. 

One should not mix levels of usage. If a paper is formaK it should not 
use colloquial or slang wtM'ds or construclionN. If it is informal, it should 
not inchule wtirds. sentences, and cmistruciions which, in that context, 
sound pompous and t>ul t^f character. 

Uiy A figure tif speech should be capable o\ being reduced to a proportion 
that will reveal the intended relationship. 

Successive figures tit speech slunild be consistent vviili one another. A 
metaphor shtuild m»l come in like a lion and then proceed to gild the lilv. 



<I4 



t>H, One slu)uld rcali/c that ull language is metaphorical: that words could 
not cover the Mux of experience without metaphorieal extensions of their 
tH>ot senses. Fiyurcs of speech are n4>t mere ornaments; they are economi- 
cal ways of conveying meaninys, 

h^l One should be able ti> distinguish the literal meaning of a metaphor 
from the intended meaning. Since Richards' terms» ''vehicle" and "ten- 
or/' have not become current* the terms 'literal meaning'* and "figurative 
meaning'' may help to make this distinction. 



F. Semantic Considerations 

*^(). Words should ni)t be used as tlu)Ugh they were identical with the 
things they represent. "'I'his is X" should be understood as "For present 
purposes this may be classified under X because in certain respects^ but 
not in all. it ^s like other things that we classify under X," 

"1. A Nvorci usually carries several different meanings. The context should 
indicate which i)f these meanings is intended and should warn a qualified 
reader against meanings that are ni)i intended. 

■^2. One shouki not impute a single, fixed meaning to a word and base a 
position upon it when other meanings may be intended or understood. 



MHANING AND MHANINGS 

Shorilv after I. A. Richards beeame University Professor at Hazard, I 
had ihe privilege of serving for one year as one of his assisiants. 

He haii many distingiiisheii visitors, some of whom questioned his more 
paradi)xieal opinions. One of them said. "I can accept your j>er:ral position 
ihai anv F-iijilish word can he given almost any meaning by iis conte.Kt, but 
surelv there are limits. How. for example, could anyone make the word 
htiusr mean hrciulV 

VViihoui hesiiation. Richards quoied a line from '^The Bugler's First 
<.n?nmunioir' by Gerard Manley Hopkins. referrinj> to ihe lommunion 
bread; 

"Hiitin^ in leat-Ii^hi huuse his too huge j^ulhead." 

Another Msuor said. 'M rectij^ni/e that words ha\e different meanings in 
diMerent cnniexts. For example, in one context ihe word n^st may mean r<- 
numuhr- in aiu»iher .'inuexi it mav mean rr/wnr. Bui ^ou seem to be saving 
ih.o siuneiinu's a wi»rd can carrv iwo such meanings simulianeously. Apart 
triu\i puns, which are trivial, how could such a word as rest in a given con- 
text mean both rvnmindrr and npiKw'^'" 

Ri^ hards ni||e<I his e^es heavenward f(»r |usi a niomem and then quoied 
Uii' liv ing speech ot Hamlel: 

" I hr rest IS silcnn* ** 



'^X One should not impute grcuicr specillclty of meaning to a woid than is 
indieatcd by trie eontext. When a word is used lot^sely, with several pos* 
sihie meanings in mind, one should not a; suine that it is intendeil to oieati 
i)ne quite detinite thing* 

"4, One should recogni/e and allow lor shifts in the meanings of w mis 
from one eiuite.xt to another. 'I'his is not iMily inevitable but highly desir- 
able except within a single train of deduetive reasoning. 

''S, One should be sensitive to the need for a clear detinition or under- 
standing (through context) of crucial terms in statements intended to be 
precise, or to lead to important decisiouN. 

^h. One should not use or be misled by the trick of securing assent to a 
proposition using a key word in one sense, and then extending this agree- 
ment to another proposition using the same word in a different sense. 

One should not hope to carr> meaning solely by a careful choice of 
terms. One should a?so indicate the sense in which one is using them by a 
context that makes them unambiguous, 

"*8. One sht^uld not stretch the meaning of a term beyond the probable 
capacity of o;ie\ audiijnce to grasp and retain. One should expect a com- 
nuMi term used in a technical sense to revert many times in the course of a 
iliscussion to its common range of meanings. 

"^). In dealinji with general statements or abstractions one should be able, 
if challcngcii, \o point to concrete things or operations on which the ab- 
straciions are based. 

MO. All lanuuage is both *'rcferen*iar' and ''emotive'': it produces a re- 
sponse that is a blend of thought and feeling. Neither function is "higher" 
I ban thtMMher; ihey are inseparable, and a defect in either will impair the 
iMher. Siudenf^ shtnild watch the emotional <*oU>ringof the words they use, 
making sure thai it is in harnu)ny with the thought, and on the highest lev- 
el ihat the thought will sustain. 

G. Argument and Rhetoric 

S|. Students shouKi be able lo classify argumems as inductive or deduc- 
tive and recouni/e that holh arc usually invtihed in {>ersuasive writing. 

.H?. Students should be able \o construct an inductive argument with care- 
ful regaril lo adequacy td sampling, statistical significance (when neces- 
sar\). and limitation td the generality of the ctmelusitni. 



N.V Suiilwius sluHiUl be uhlv \k\ construct k\ doiluctivc ar^juntvut wiili caiv* 
lul rcyurd to the valklity 4>t'ihc prcnuscs* the consisiency of terms atui 
propositions, the avoidance orUitlacies, and the souiutncs!i of each step in 
the reasoniuti. 

N4. Students should be able U\ rvci>j4ni/e» reUile» and avoid conunon fal- 
lacies. 

Students should tveoj^ni/e ihe role ordetinitiiH!.> and assumptions in 
aruumeni and shoidd be able to briny to light hidden assumptions by sup- 
pi) int» missiny ptvmises. 

No. Suulenis should be able to ailapt an argument to a given occasion and 
audience by such means as organization, establishing an appropriiiie 
character tor the speaker, tnoditying tite patterns of sentences* using 
appropriate words and tlgores i>t' speech, etc, 

H. Style 

S". Students should realize that, in one important sense, style is not the 
natural and inevital^le expression of a personality in writing but the 
gradual discovery and adoption of* successful ways of achieving certain 
purposes in writing. It becomes habitual and recognizable only to the ex- 
lent that the writer's purposes are fc^irly constant, and he keeps using and 
developing the same means of achieving them. This view of style is more 
fruitful than the personality theory because it dispels mystery and gives 
students something to do besides waiting for their personalities to achieve 
their predestined form. They should clarify their purposes in writing atid 
set aboui ciiscoveriug suecesst'ul ways of achieving them. 

SH. Students should realize that the selection of details is an important 
element of style and is eotuiitit^ted by the purpose in writing. 

M. Stiulents should recognize and be able to produce the effects achieved 
In selection ot words: by various levels of usage, by conerete (image-bear- 
ingJ \s, iibstracl words* hy etnotionally charged vs. neutral words, by the 
proportion of content to strueiure words, etc. 

MO. Stiidenis should be al^le to use appropriate figurative language to clar- 
if> an icieii, to add interest, and to intensify etnotion. 

Wl. Students should reeoi^uizc and W able to produee the effects aehieved 
In \ ,triiUis patterns of sentenees: long or short, hard or easy to follow, uor- 
m,il. interrupted, or nu erted patterns, few or many comieciives of the \ari- 
»Mis tvpcs, etc. 



MuUctiis siuuild be a\v;nv i>t ittc suutul atut rhythm ut' their sctitctwcs 
whiMt \vi\i\ atouii. Mu!y shottUi he able t(^ \\v\W Nctttcitccs in wtiich similar 
nuirival patterns arc ivpcatal* ov suihlunly changcil, lor cnuuional ellccts* 
I \w\ shuulil t*.|ually aviMd harsliness ami the musical cllccts of puctry that 
arc inappropriate lo prose. They should watch nowcI anil consonant 
sounds so that there is a pleasiuii variety without incongruity ov awkwat^d* 

HCSS. 

M.V SiUilcnis should be able \o rceoj:ni/e ami proiluee the cITcct i>t' a 
ehan.uei»t pace in the movement i^l'scnicnces* IVom quiet and delibet*atc to 
hurriciL escited, or passionate. 

W4. Students should be able to adapt their style to various literary forms 
such as jnuable, table, dialogue, tattiiliar essay, criticism, scietuinc repitrt. 
etc. 

W5. Students should be able ti> ailapt their style to their attitude toward 
ihesulMcei luhai Kichanis calls **ione'*): admiration, irony, invective, ob* 
jcciixc appraisal, etc. 

^»tv Siuiknis slunild develop a sustained interest in stylistic elTects which 
ihe\ come i\\MM\ in readinji, aiul in discoverinj* the means by which they 
wrre achiev ed. \o the end that they may }>radually achieve a ivadable pilose 
sixlcot ilieir own. It i^ almost ton nuich lo cspcct that any students except 
lM»rn wriiers will achieve a nuiture prose style betore graduation tVom cob 
U\ue. Inn a touiuUitimi may be laid and habits may be built toward the es- 
i.iblishnuMU ot a mature prose style by the age o\' thirty. 



Hl'MOR 

In .1 li'iiLT o>iniiH-iiliiij» nil qa.ilitics in sUuicni wriiinyi dial he alninsi 
fu'vor t^>uiui in nialiocie i^r h»nl papers. FVotcssiw Macklin I fionias. fornicr- 
K f \anuner in l-i\ulisli at C liicaM^) Stale C'olle|t^o. cnncliulc(i wuh ihc t'ollow- 
iiil: pi>inl: 

•'HIMOK. Noi clowninii. ot coiirvi*. iliou.uli a >»iunl wriicr musi be al 
l4)\\oii lo snap al a .uooil iriilc; raihor i. knack tor iniclleciual or draniaiic (as 
4>ppi»si-il to ineivU U'rhal) iron\ or i .K-<wi^niii\ (as in AViv y*o/-A-rr word less 
varioons). Siamlards here \unild naiiirallv !^e hard lo tlx — one ieaeher\ iilea 
t>t \sliai is tunnv otieii st-enis pawk. lo anoilier. Bin adriti ou the shoreless 
solM'ieiv oi stiulenl wniinu. we needn't drive a hanl baruain. An\ o\eri !n- 
ti-nnon. ht)\\e\er teebl\ r\eeuieo. to indieale iliat the writer has some 
m'tuinil li>r lhnui>httiil aniiiseine/il should beerediled as hunior.** 



Glossai:y 



bias is the influence on grades of irrelevant eonsideratiors such as liking or dislik* 
inn the student, disagreement with his views, etc. 

duster as used in this booklet is a group of readers whose grades agree within their 
group and disagree with the grades of every other group to a greater extent than 
can be attributed to chance. 

combining scores or grades suggests several procedures for putting essay grades 
and tibjective scores on a common score- icale and combining them in ways that 
yield total scores that conform to reasonable expectations. 

correlation is a mathematical procedure that shows to what extent it is true that, 
the higher a student stands on one measure, the higher he stands on another. The 
measures neetl not be of the same characteristic nor on the same scale: one can 
correlate heigb*. m inches with weight in pounds. But one must correlate two sets of 
measures of the same students; there is no way to correlate two groups on the same 
measure. 

The standard but difficult way of computing correlations is cahed '^product- 
moment'* correlation. Since correlating the grades assigned independently by dif- 
ferent readers is the basic procedure in computing the reliability of essay grades, a 
quick and easy way that yields approximately the same results is c: lied **top-quar- 
ter tetrachorics" and is explained for the first time in this booklet. 

rhe ciwlation between two sets of essay grades for the same students is re- 
garded as the reliability i^f one rating. Since one expects to use the sum or average 
of both grades as the final grade, this conflation must be 'Vstepped up by the 
Spearman-Brown Prophecy Ftirmula" to get the reliability of this set of final 
grades. But all the teacher has u^ do is to compute onv percent; then he can look 
up the corresponding totrachoric and the reliability in the table presented in the 
section on Computing the ReliaJMlity t^f Essay Grades. 

dLstrfhution of scores, grades, etc. usually takes the form of a list of all possible 
scores or ratings from high to low with a tally after each score for each student who 
made ii. 



99 



100 Paut II lUi ihmi ti 



factor mwiUkh as aiscuNswl m ihis booklet is a ciMuple\ nuitluMuatiuil provcdurc 
thill m|nla»s i\ larye miniber iil iciUkMs lo «nutc wpivH ot a lar^jv? number u| vssavs 
wHticti bv ilu' same stmlcitls. One then eum|niles the eorrolatUm bmecu the 
.urades mI eaeh reatter and thi»se id* every other reader. A ei>m|>uter ean titen pick 
out cUisiers id readers whose ^riides afjree pretty well within their cluster but dis- 
ajuree w iih ihi ^railes ot every other cluster to a greater extent than ean bv attribut- 
ed til chanee. A elassitleatioii id commeins written on these papers l\v the readers 
who best represeiu eaeh ehisl /r can then reveal the qualities insttutent writing that 
eaeh clusier emphasized, sucti as ideas, organi/atiuiu wimllng. and corrcctnens ot 
e\pressii»ii. Mach of these distinctive emphases is called {\ famr* 

Urttde^lhuv^are usuallv lines di awu across a JiMrihtiiiun (q.v.) of total scores on an 
e\aniinalii>n U\ mark "the liividiug lines between the final tetter grades or their nu- 
merical ei|ni\ stents. The staff' usually tries to make the percentages awarded the 
\aruni> grades woidorin ti» reasonable expeclalious. 

holistic graiiing or scoring is a term not used in this lHU>klet, but it refers to what is 
valleil "rating on general impression." It consists of givhig a single grade or score 
lo each essay rather than a number of ratings on various qualities. Vhe latter Is 
callcil "analytie grading.*' 

Independent grades or i^iher measures most eomnuuily refer to the practice of hav- 
ing eaeh reader record his grades and ctHiinients on a separate work sheet and 
write m»tliing i»n the essays themsehes. Thus tn; readci' knows what grade any 
other reader has given a paper. The term "indepetident" was used in a different 
sense later, where it was argued that there nuist be si»tiie separation in time as well 
as in topic between the wriUng i»f twt» essays to make them genuinely independent 
samples i>t each sttaient's w riling. I'wo short essays written in the same sessiiui of 
an examinatiim rarely differ more in quality than pages 1 and 2 o\' the same essay. 

Kuder.Rlehardson Formula 21 is a quick and easy n»rniula for ei»mputing the re- 
liability v\ iibjeetive tesis. All ym have (o know is the mean, the stamlard devia- 
tiiMi. ami the number of ilems. 

loading as useii in this discussion of faeti^r analysis is basically the ei»rrelatiini i>f 
each reaiier\ grades with the central tendency represented by each factor. 'I'he 
higher his Kuiding im a given factor, the nu>re he has been influenced by the dis- 
linctiNc emphasis t»f that cliister id readers. 

make-up examination requires no liefinitiim because it is almost universally pro- 
\uleii Un stiuienis who were absent. In this section it is argued that students who 
wcieiiisappointed in their grade on the reiiUlar examination should be allowed to 
take ihc make-up. ami whichtver graiie is higher should staml in the record. 

mean as tiseil in this huoklet is the same as a mathematical average. 

median is the miiidle sci»re other measure when they are ranked in oriler from 

hiuh \o 

name-slip is ihc sheet on whieh each stuiient identities his paper only by any 
number o\ six Huures chosen at random. He inckuies his name, grade, teacher, 
ilato. ami anv tether intm-matiim that may be required. These name-slips are 
hK-k».u up until all .i»rades are turned in. si» that no reader has any way ot finding 
KHW winch student wri»tc anv paper. 



ittirniiii curu* is a cutvv ivprcseiuin^ the thcurvtival ilistrthulion ut an inlliiite 
lUiiiihvr ui'ttcHivt tiuasittvs ut'wIunmtcrisitcH that arc the pmUtvt itt'iiune tluitt 
totiv ifulopciulcut causes. 

normuil> itistrihuted tvtcMs to abilities or cliaiactefisttes that ntav rcasuiiabiy he 
c^pciitHl tooveur \\\ the |M0|nn1iuns predicted by tlic ilunnal cuiVe itt lattje |)0|nh 
latiutis. Male and teinale areobvimisly nut such eharacteriMies» but coiuplei^ abili- 
ties such as reading and writini^ a«^c. 

ttmm is a term m\ uscil in this tnmklet but is su cuntniuu in iliscussions of ubjoe^ 
ti>e lests that a delinitiun may be useful A published test is usually t>iven to a 
lar^e. repres* ?'MliH» sample ofthv kinds of students for whom the test is intended. 
The test manual ••«^i!ains tables showing the percent of students in each tirade who 
fell lielow each score on the lesi. 

pvreetttlle refers to the percent of studetUs who fell below each score in the tables 
of norms described above. 

posiliH' reinlbrevmcnt is a tettn popularised by the Harvard psychologist B. 
Skinner io refer to tfie principle that recognUirlg atui rewarding whatever a stu* 
dent tor animal) tloes rij»ht usually lias a stro!i|;cr effect on learning than any kind 
or anunnit of punishment of what he does wrong. 

rDndum variation usually refers to differences in scores on measures designed to 
measure the same ability or characteristic, depending on the sample of tasks in* 
eluded in each measure. Students may just happen to l)e more funuliar with or 
adept at one sample uf such tasks than another. 

ruH scores arc the number lif items in objective tests that each student answered 
correctly. 

rciittbllity is the anunint (dagreenient between twoscts of independent measures of 
the same cha!*acteristie in the same studt nts. taken at about the same time. In ob- 
jeeiive tests, it is usually an estimate of how close they would come to getting the 
same scor;-, on a second test of the same kind, it differs from atrreUuion (q»v,) in 
that the measures must be dcsi^ined \o measure the same characteristic* while iw- 
n hiiimts may be computeti between quite different characteristics, such as height 
and ueijiht. 

Remondino^s factor cmphasi/ed handwriting atid neatness, which did not appear 
in our factor analysis because we had \o use typed copies of the essays written by 
students. Later, when ue asked teachers to rate handwritten essays, we added 
Remomiino's factor \o tuir list. 

revlen of discrepant grades is a pnKniure in which essays thai received far differ- 
ent .uratics from the t^ri^inal rcatiers were referred to a small comniitiee of the most 
espcrienccti and trusted reatiers. 'I hey were not ti^ld what thei^rijiinal grades were: 
ihfv kncu only that ihc graties tiiffered. One member of this committee ^ave each 
ot these papers a third independent rcatiinji. and clerks substituted this grade for 
whichever ot the orij^inal grades was farther from it. This procedure usual!) had 
the eftcd of increasing the reliability of the essay grades by at least 10 points. 

sifinificanee of differences is the result id various mathemalical procetlures that 
determine the chances in a huntired or a thtnisand that tiifferences between soi^rcs, 
averages, and oilier measures can reasonably be attributed t.. chance, given the 
siunJiini i-rror (q.\.) of ihese measures. 



ERIC 



«ptrai nrdi'r rdcis to amnt^iH^ siHcrai UilliMciH Wmm ot it test in such att mkv 
liiai ilic Miulctu ill each chiss UvstCit will |(et initm ^ iliv next l*urm .1 aiut 

nmff tirtttltng is ihc ^nutin^ of km chsuvs t\v v\\ iiu inbcrs ol the sUilT ut thai Ciium; 
(ni that ilcinu'UnvniK usuallv alUM' some irahiiti^ in ^nutini^i by vonuui^n sfauihiiits! 
Iliv essays arc idcnUtUnl only by cihIc inunboi's so that no reader knows which 
sutileni wrote any paper, liach essay u ^jradetl by (wo reuUers, who record their 
grades and conunents on separate wurk sheets ami write nothlni^ on the essays 
thentsebes. so (hat nu reader knows which grade any essay rca^ived IViun any other 
reader. 

ultthdard devlttthtii Is »t special Hnd id' average o( (he disunices (deviations) from 
(he nuMii id all svi»res on any measure. I( shows how tar the scores are spreatt ou( 
trotn (he ntean. It' the scores are nornudly distributed, about (wo*(lth'ds wilt lie 
within i^ne siatularil de\iatii)n truni the mean, and ^KV'n within two statulanl devia« 
lions. This (i.uure is nuire inipiHtant than most teachers reali/e. since ne«(iy all 
conipu(ations a|»plieil to (est scores have (he s(andard deviation in (heir t'orntulas. 
h is iilso (hv basis for (he s(andard seori»s usod l>y many test (nitdishers. in which 
scores Iviti^ ( ne stauilard deviation apart mav t«e designate<i by such numbers as 
.^0. 40. Sii M). and 7(). 

Mttndurd error tna\ be thiHt)*hi td as the limi(s within which scores on atiy ^iven 
measure may \ai> by chance. It any measurement operation were repeated a lar^je 
nutnber of times uvithout stuilents* learidnj; or f'orf»et(in|i any(hinii). and we kept 
avera^inikj the results until we were ipiite sure what the true measure was, we would 
thul that about two-thirds of the scores leading* up to this final average lay within 
*Mie stamfaril error id ilie true measire. and ^>.Vn wiihit) two standard errors. This 
cimiputaiion i% most important in deienninin^ whether diflerences between the 
a\erases of different uroups are "real** or .-ould be attributed to chance, 'fhe 
formula fi»r the staiulard error uf such averajics is the sUMulard deviatiim divided 
bv tlic sipiare riU)t of the number of students. 

standard ^eure« are scores based mi the standard deviation (q.\.). The standard 
scores for test essays reeommendeil in this booklet are H). 20. .U). 40. an^l H). in 
v^hieh tile mean is arbitrarily called .M) and the standard deviation \i\ T\\v second 
di^il in sueh scores is understood t»» refer to tenths of the stattdard deviation. 

stanini' is a score scale of poiius. baseii v»n the standard d/viation. in whicli the 
mean is 5 and the statuiard deviation 2. so that each point in this scale covers half a 
stanil.ird de\iation. This scale was w iilely used by our Arnieii F^orces in WorKl War 
II. anil tor some years we made strenumis efforts to i»et teachers to adof)t it» but 
thcv were so used to thinking in terms ofa scale of S points that they so.»n reverted 
to It. Wc then aiiaptcii such a scale U^ sianiiard scores by the procedures (iiscussed 
in ihc scctiiwi nn Statuiard Scores for Test Kssays. 

teachers' predietioni^ may be lietlned as a proceilure iti which teachers predict flow 
manv students iti each oi their classes arc likel> to make each ^raile twi an ap* 
pnuchinvi examination. The averai^e of these prc('ictiuns indicates what iKMvent* 
a.ue ot students we should espect to make A\. H's. ("s. etc. nr the nutiierical equi\- 
alenis of these j^rades. I he.e percentages ser\e to keep reatlers from strayiii^ too 
t.ir fr*im the standanis ami expectations of their cnllca^ucs. 

validity is mentiotied onlv lUicc and not discusscti in this boi>kkt sitice samples of 
student writini^ aredncu mcasutes of the ahilit\ \^e\^ish tonieasure and lietice are 
\alid hv ilelltiition. If they were jiuij^ed by some eccentric slaiidartl. such as ifieir 



vunhnmttv MiiiAist iUKUiiius. thi* uiiulcs \m\U\ \\\\ luiiiioi' lu* miM iwmHv\ oi 

sM\. iuul unatMilav.v Is scUfum i|tu«HUouoai ami it has hocn stishiiiKHl hv ituinvnMis 
stiutiCH sl.tm inii thai tho.v p^'divt vvljai ihvy mav ivasoiiably fu* iv\|H»i,1iul to inviHct 
Unlv lliv soH ol H'si nt oiivM 'iUMivtioh ilisvusscti in A|HH»tHli.\ 1) has \)kv\\ 
miiMioiKil ai a \i\UK\ WW writ in^t aNIily. IUmv i( is Uot'emkHl sMily as a vjulck am! 
iMsv \\M\m\ ol nuMsuilnjj ♦aiulliaiity ^vith thi» rules ami v4iii\iMifiuiis nt iiitonmU 
siaUitahl I'ni^lish. This is a valid uhiunivc in its om\ rij^hi. cumi it it hail nothiiijs iu 
au ^iit) mitihM ahiiitv. hut iti lad suvh tests olicti ailuin hi^h vonvlatiuus uith 
varvAilly HcierMilmHl ^jiailes mi cssa>s. 

t hv mitri' uouUI m ileteml the sfiori test nl ^novvleUiii ot Mrammai\ as esem* 
phlUsI h^ l lie **Shail<m" lest tAppeJulis II) as a vaHJ test of uriti»« ability. He* 
seaivhrrs in seuMal cuunn les nver a |oun periiu! of time haw* been uitable to show 
that anv kimi ol |»raminar. trattitiomi! or nunlerii. has am consistent or substantial 
ellivi on uritinij abilit> In one\ native lanKUa^f. This test nustsures only what it 
initporis til measure: abiHt> inUesvrilie a sentence in ^raniinatieat tern^s* 

Helj}hll«u is the process ol j;Ufnj» some paits ol'a complex examination more credit 
than other parts in arrivinit* ;it the total hhmo. The \UM|»hts arc usually Ueterntined 
disvussitnt iimf arc easily a|iplied In nuiltiplyini* scores on one part by some 
thinulikc I ^. ami scores on another part by .8. Ii was mentioneil that researcheis 
svlilotn tbul that uei^hiin)* makes any seriuus dillerenVe in Mnal grades. Students 
tend lo come oui in ahmtt the same rank order regardless of vuiMhtiuK, 



